Thread (14 messages) 14 messages, 9 authors, 2021-05-24

Re: virtio_net: BQL?

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2021-05-24 09:10:13
Also in: lkml, virtualization

On Mon, May 24, 2021 at 10:53:08AM +0800, Jason Wang wrote:
在 2021/5/18 上午5:48, Dave Taht 写道:
quoted
On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
[off-list ref] wrote:
quoted
On Mon, May 17, 2021 at 2:44 PM Dave Taht [off-list ref] wrote:
quoted
Not really related to this patch, but is there some reason why virtio
has no support for BQL?
There have been a few attempts to add it over the years.

Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/ (local)

That thread has a long discussion. I think the key open issue remains

"The tricky part is the mode switching between napi and no napi."
Oy, vey.

I didn't pay any attention to that discussion, sadly enough.

It's been about that long (2018) since I paid any attention to
bufferbloat in the cloud and my cloudy provider (linode) switched to
using virtio when I wasn't looking. For over a year now, I'd been
getting reports saying that comcast's pie rollout wasn't working as
well as expected, that evenroute's implementation of sch_cake and sqm
on inbound wasn't working right, nor pf_sense's and numerous other
issues at Internet scale.

Last week I ran a string of benchmarks against starlink's new services
and was really aghast at what I found there, too. but the problem
seemed deeper than in just the dishy...

Without BQL, there's no backpressure for fq_codel to do its thing.
None. My measurement servers aren't FQ-codeling
no matter how much load I put on them. Since that qdisc is the default
now in most linux distributions, I imagine that the bulk of the cloud
is now behaving as erratically as linux was in 2011 with enormous
swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
rings, [1], breaking various rate estimators in codel, pie and the tcp
stack itself.

See:

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png

See the swings in latency there? that's symptomatic of tx/rx rings
filling and emptying.

it wasn't until I switched my measurement server temporarily over to
sch_fq that I got a rrul result that was close to the results we used
to get from the virtualized e1000e drivers we were using in 2014.

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png

While I have long supported the use of sch_fq for tcp-heavy workloads,
it still behaves better with bql in place, and fq_codel is better for
generic workloads... but needs bql based backpressure to kick in.

[1] I really hope I'm overreacting but, um, er, could someone(s) spin
up a new patch that does bql in some way even half right for this
driver and help test it? I haven't built a kernel in a while.

I think it's time to obsolete skb_orphan() for virtio-net to get rid of a
brunch of tricky codes in the current virtio-net driver.

Then we can do BQL on top.

I will prepare some patches to do this (probably with Michael's BQL patch).

Thanks
First step would be to fix up and test the BQL part.
IIRC it didn't seem to help performance in our benchmarking,
and Eric seems to say that's expected ...

quoted
quoted
quoted
On Mon, May 17, 2021 at 11:41 AM Xianting Tian
[off-list ref] wrote:
quoted
BUG_ON() uses unlikely in if(), which can be optimized at compile time.

Signed-off-by: Xianting Tian <redacted>
---
   drivers/net/virtio_net.c | 5 ++---
   1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c921ebf3ae82..212d52204884 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
sk_buff *skb)
         else
                 hdr = skb_vnet_hdr(skb);

-       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
+       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
                                     virtio_is_little_endian(vi->vdev), false,
-                                   0))
-               BUG();
+                                   0));

         if (vi->mergeable_rx_bufs)
                 hdr->num_buffers = 0;
--
2.17.1
--
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC

--
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help