Thread (73 messages) 73 messages, 4 authors, 2021-07-09

Re: [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves

From: Jason Wang <jasowang@redhat.com>
Date: 2021-06-29 03:21:38

在 2021/6/28 下午7:23, David Woodhouse 写道:
On Mon, 2021-06-28 at 12:23 +0800, Jason Wang wrote:
quoted
在 2021/6/25 下午4:37, David Woodhouse 写道:
quoted
On Fri, 2021-06-25 at 15:33 +0800, Jason Wang wrote:
quoted
在 2021/6/24 下午8:30, David Woodhouse 写道:
quoted
From: David Woodhouse<redacted>

When the underlying socket isn't configured with a virtio_net_hdr, the
existing code in vhost_net_build_xdp() would attempt to validate
uninitialised data, by copying zero bytes (sock_hlen) into the local
copy of the header and then trying to validate that.

Fixing it is somewhat non-trivial because the tun device might put a
struct tun_pi*before*  the virtio_net_hdr, which makes it hard to find.
So just stop messing with someone else's data in vhost_net_build_xdp(),
and let tap and tun validate it for themselves, as they do in the
non-XDP case anyway.
Thinking in another way. All XDP stuffs for vhost is prepared for TAP.
XDP is not expected to work for TUN.

So we can simply let's vhost doesn't go with XDP path is the underlayer
socket is TUN.
Actually, IFF_TUN mode per se isn't that complex. It's fixed purely on
the tun side by that first patch I posted, which I later expanded a
little to factor out tun_skb_set_protocol().

The next two patches in my original set were fixing up the fact that
XDP currently assumes that the *socket* will be doing the vhdr, not
vhost. Those two weren't tun-specific at all.

It's supporting the PI header (which tun puts *before* the virtio
header as I just said) which introduces a tiny bit more complexity.
This reminds me we need to fix tun_put_user_xdp(),
Good point; thanks.
quoted
but as we've discussed, we need first figure out if PI is worth to
support for vhost-net.
FWIW I certainly don't care about PI support. The only time anyone
would want PI support is if they need to support protocols *other* than
IPv6 and Legacy IP, over tun mode.

I'm fixing this stuff because when I tried to use vhost-tun + tun for
*sensible* use cases, I ended up having to flounder around trying to
find a combination of settings that actually worked. And that offended
me :)

So I wrote a test case to iterate over various possible combinations of
settings, and then kept typing until that all worked.

The only thing I do feel quite strongly about is that stuff should
either *work*, or *explicitly* fail if it's unsupported.

I fully agree, but I suspect this may only work when we invent something 
new, otherwise I'm not sure if it's too late to fix where it may break 
the existing application.

At this point, although I have no actual use for it myself, I'd
probably just about come down on the side of supporting PI. On the
basis that:

  • I've basically made it work already.

  • It allows those code paths like tun_skb_set_protocol() to be
    consolidated as both calling code paths need the same thing.

  • Even in the kernel, and even when modules are as incestuously
    intertwined as vhost-net and tun already are, I'm a strong
    believer in *not* making assumptions about someone else's data,
    so letting *tun* handle its own headers without making those
    assumptions seems like the right thing to do.



If we want to support PI, I need to go fix tun_put_user_xdp() as you
noted (and work out how to add that to the test case). And resolve the
fact that configuration might change after tun_get_socket() is called —
and indeed that there might not *be* a configuration at all when
tun_get_socket() is called.

Yes, but I tend to leave the code as is PI part consider no one is 
interested in that. (vhost_net + PI).


If we *don't* want to support PI, well, the *interesting* part of the
above needs fixing anyway. Because I strongly believe we should
*prevent* it if we don't support it, and we *still* have the case you
point out of the tun vhdr_size being changed at runtime.

As discussed in another thread, it looks me to it's sufficient to have 
some statics counters/API in vhost_net. Or simply use msg_control to 
reuse tx_errors of TUN/TAP or macvtap.

I'll take a look at whether can pass the socklen back from tun to
vhost-net on *every* packet. Is there a MSG_XXX flag we can abuse and
somewhere in the msghdr that could return the header length used for
*this* packet?

msg_control is probably the best place to do this.

  Or could we make vhost_net_rx_peek_head_len() call
explicitly into the tun device instead of making stuff up in
peek_head_len()?

They're working at skb/xdp level which is unaware of PI stuffs.

But again, I think it should be much more cheaper to just add error 
reporting in this case. And it should be sufficient.


To be clear: from the point of view of my *application* I don't care
about any of this; my only motivation here is to clean up the kernel
behaviour and make life easier for potential future users.

Yes, thanks a lot for having a look at this.

Though I'm not quite sure vhost_net is designed to work on those setups 
but let's ask for Michael (author of vhost/net) for his idea:

Michael, do you think it's worth to support

1) vhost_net + TUN
2) vhost_net + PI

?

I have found
a setup that works in today's kernels (even though I have to disable
XDP, and have to use a virtio header that I don't want), and will stick
with that for now, if I actually commit it to my master branch at all:
https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6

Yes, but unfortunately it needs some tricks for avoid hitting bugs in 
the kernel.

I might yet abandon it because I haven't *yet* seen it go any faster
than the code which just does read()/write() on the tun device from
userspace. And without XDP or zerocopy it's not clear that it could
ever give me any benefit that I couldn't achieve purely in userspace by
having a separate thread to do tun device I/O. But we'll see...

Ok.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help