Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Jason Wang <jasowang@redhat.com>
Date: 2017-08-22 11:47:35
Also in:
virtualization
On 2017年08月22日 11:10, Willem de Bruijn wrote:
quoted
quoted
quoted
Interesting, deadlock could be treated as a a radical case of the discussion here https://patchwork.kernel.org/patch/3787671/. git grep tells more similar skb_orphan() cases. Do we need to change them all (or part)?Most skb_orphan calls are not relevant to the issue of transmit delay.Yes, but at least we should audit the ones in drivers/net.Do you mean other virtual device driver transmit paths, like xen, specifically?
Git grep does not show skb_orphan() was used for xen for me. But looking at cxgb4/sge.c which seems to call skb_orphan() for large packet and reclaim transmitted packets when: - doing ndo_start_xmit() - or a timer.
quoted
quoted
quoted
Actually, we may meet similar issues at many other places (e.g netem).Netem is an interesting case. Because it is intended to mimic network delay, at least in the case where it calls skb_orphan, it may make sense to release all references, including calling skb_zcopy_clear. In general, zerocopy reverts to copy on all paths that may cause unbounded delay due to another process. Guarding against delay induced by the administrator is infeasible. It is always possible to just pause the nic. Netem is one instance of that, and not unbounded.The problem is, admin may only delay the traffic in e.g one interface, but it actually delay or stall all traffic inside a VM.Understood. Ideally we can remove the HoL blocking cause of this, itself.quoted
quoted
quoted
Need to consider a complete solution for this. Figuring out all places that could delay a packet is a method.The issue described in the referenced patch seems like head of line blocking between two flows. If one flow delays zerocopy descriptor release from the vhost-net pool, it blocks all subsequent descriptors in that pool from being released, also delaying other flows that use the same descriptor pool. If the pool is empty, all transmission stopped. Reverting to copy tx when the pool reaches a low watermark, as the patch does, fixes this.An issue of the referenced patch is that sndbuf could be smaller than low watermark.quoted
Perhaps the descriptor pool should also be revised to allow out of order completions. Then there is no need to copy zerocopy packets whenever they may experience delay.Yes, but as replied in the referenced thread, windows driver may treat out of order completion as a bug.Interesting. I missed that. Perhaps the zerocopy optimization could be gated on guest support for out of order completions.
Yes, we may plan to explicitly notify driver about out of order in future virtio.
quoted
quoted
On the point of counting copy vs zerocopy: the new msg_zerocopy variant of ubuf_info has a field to record whether a deep copy was made. This can be used with vhost-net zerocopy, too.Just to make sure I understand. It's still not clear to me how to reuse this for vhost-net, e.g zerocopy flag is in a union which is not used by vhost_net.True, but that is not set in stone. I went back and forth on that when preparing fix 0a4a060bb204 ("sock: fix zerocopy_success regression with msg_zerocopy"). The field can be moved outside the union and initialized in the other zerocopy paths.
Ok. I see. Thanks _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization