Re: [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves
From: Jason Wang <jasowang@redhat.com>
Date: 2021-06-29 03:21:38
在 2021/6/28 下午7:23, David Woodhouse 写道:
On Mon, 2021-06-28 at 12:23 +0800, Jason Wang wrote:quoted
在 2021/6/25 下午4:37, David Woodhouse 写道:quoted
On Fri, 2021-06-25 at 15:33 +0800, Jason Wang wrote:quoted
在 2021/6/24 下午8:30, David Woodhouse 写道:quoted
From: David Woodhouse<redacted> When the underlying socket isn't configured with a virtio_net_hdr, the existing code in vhost_net_build_xdp() would attempt to validate uninitialised data, by copying zero bytes (sock_hlen) into the local copy of the header and then trying to validate that. Fixing it is somewhat non-trivial because the tun device might put a struct tun_pi*before* the virtio_net_hdr, which makes it hard to find. So just stop messing with someone else's data in vhost_net_build_xdp(), and let tap and tun validate it for themselves, as they do in the non-XDP case anyway.Thinking in another way. All XDP stuffs for vhost is prepared for TAP. XDP is not expected to work for TUN. So we can simply let's vhost doesn't go with XDP path is the underlayer socket is TUN.Actually, IFF_TUN mode per se isn't that complex. It's fixed purely on the tun side by that first patch I posted, which I later expanded a little to factor out tun_skb_set_protocol(). The next two patches in my original set were fixing up the fact that XDP currently assumes that the *socket* will be doing the vhdr, not vhost. Those two weren't tun-specific at all. It's supporting the PI header (which tun puts *before* the virtio header as I just said) which introduces a tiny bit more complexity.This reminds me we need to fix tun_put_user_xdp(),Good point; thanks.quoted
but as we've discussed, we need first figure out if PI is worth to support for vhost-net.FWIW I certainly don't care about PI support. The only time anyone would want PI support is if they need to support protocols *other* than IPv6 and Legacy IP, over tun mode. I'm fixing this stuff because when I tried to use vhost-tun + tun for *sensible* use cases, I ended up having to flounder around trying to find a combination of settings that actually worked. And that offended me :) So I wrote a test case to iterate over various possible combinations of settings, and then kept typing until that all worked. The only thing I do feel quite strongly about is that stuff should either *work*, or *explicitly* fail if it's unsupported.
I fully agree, but I suspect this may only work when we invent something new, otherwise I'm not sure if it's too late to fix where it may break the existing application.
At this point, although I have no actual use for it myself, I'd
probably just about come down on the side of supporting PI. On the
basis that:
• I've basically made it work already.
• It allows those code paths like tun_skb_set_protocol() to be
consolidated as both calling code paths need the same thing.
• Even in the kernel, and even when modules are as incestuously
intertwined as vhost-net and tun already are, I'm a strong
believer in *not* making assumptions about someone else's data,
so letting *tun* handle its own headers without making those
assumptions seems like the right thing to do.
If we want to support PI, I need to go fix tun_put_user_xdp() as you
noted (and work out how to add that to the test case). And resolve the
fact that configuration might change after tun_get_socket() is called —
and indeed that there might not *be* a configuration at all when
tun_get_socket() is called.Yes, but I tend to leave the code as is PI part consider no one is interested in that. (vhost_net + PI).
If we *don't* want to support PI, well, the *interesting* part of the above needs fixing anyway. Because I strongly believe we should *prevent* it if we don't support it, and we *still* have the case you point out of the tun vhdr_size being changed at runtime.
As discussed in another thread, it looks me to it's sufficient to have some statics counters/API in vhost_net. Or simply use msg_control to reuse tx_errors of TUN/TAP or macvtap.
I'll take a look at whether can pass the socklen back from tun to vhost-net on *every* packet. Is there a MSG_XXX flag we can abuse and somewhere in the msghdr that could return the header length used for *this* packet?
msg_control is probably the best place to do this.
Or could we make vhost_net_rx_peek_head_len() call explicitly into the tun device instead of making stuff up in peek_head_len()?
They're working at skb/xdp level which is unaware of PI stuffs. But again, I think it should be much more cheaper to just add error reporting in this case. And it should be sufficient.
To be clear: from the point of view of my *application* I don't care about any of this; my only motivation here is to clean up the kernel behaviour and make life easier for potential future users.
Yes, thanks a lot for having a look at this. Though I'm not quite sure vhost_net is designed to work on those setups but let's ask for Michael (author of vhost/net) for his idea: Michael, do you think it's worth to support 1) vhost_net + TUN 2) vhost_net + PI ?
I have found a setup that works in today's kernels (even though I have to disable XDP, and have to use a virtio header that I don't want), and will stick with that for now, if I actually commit it to my master branch at all: https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6
Yes, but unfortunately it needs some tricks for avoid hitting bugs in the kernel.
I might yet abandon it because I haven't *yet* seen it go any faster than the code which just does read()/write() on the tun device from userspace. And without XDP or zerocopy it's not clear that it could ever give me any benefit that I couldn't achieve purely in userspace by having a separate thread to do tun device I/O. But we'll see...
Ok. Thanks.