Re: [RFC PATCH v2] ptr_ring: linked list fallback

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2018-02-27 19:35:44
Also in: lkml

On Tue, Feb 27, 2018 at 09:53:49AM -0800, Eric Dumazet wrote:

On Mon, 2018-02-26 at 03:17 +0200, Michael S. Tsirkin wrote:

quoted

So pointer rings work fine, but they have a problem: make them too small
and not enough entries fit.  Make them too large and you start flushing
your cache and running out of memory.

This is a new idea of mine: a ring backed by a linked list. Once you run
out of ring entries, instead of a drop you fall back on a list with a
common lock.

Should work well for the case where the ring is typically sized
correctly, but will help address the fact that some user try to set e.g.
tx queue length to 1000000.

In other words, the idea is that if a user sets a really huge TX queue
length, we allocate a ptr_ring which is smaller, and use the backup
linked list when necessary to provide the requested TX queue length
legitimately.

My hope this will move us closer to direction where e.g. fw codel can
use ptr rings without locking at all.  The API is still very rough, and
I really need to take a hard look at lock nesting.

Compiled only, sending for early feedback/flames.

Okay I'll bite then ;)

Let me start by saying that there's no intent to merge this
before any numbers show a performance gain.

High performance will be hit only if nothing is added in the (fallback)
list.

Under stress, list operations will be the bottleneck, allowing XXXX
items in the list, probably wasting cpu caches by always dequeue-ing
cold objects.

Since systems need to be provisioned to cope with the stress, why
trying to optimize the light load case, while we know CPU has plenty of
cycles to use ?

E.g. with tun people configure huge rx rings to avoid packet drops, but
in practice tens of packets is the maximum we see even under heavy load
except <1% of time.

So the list will get used a very small % of time and yes, that
time it will be slower.

If something uses ptr_ring and needs a list for the fallback, it might
simply go back to the old-and-simple list stuff.


So for size > 512 we use a list, for size < 512 we use a ptr ring?

That is absolutely an option.

My concern is that this means that simply by increasing the size
using ethtool suddenly user sees a slowdown.
This did not use to be the case so users might be confused.

Note that this old-and-simple stuff can greatly be optimized with the
use of two lists, as was shown in UDP stack lately, to decouple
producer and consumer (batching effects)

Pls note that such a batching is already built in to this patch:
packets are added to the last skb, then dequeued as a batch
and moved to consumer_list.

-- 
MST

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help