Re: [RFC PATCH v2] ptr_ring: linked list fallback
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2018-02-27 19:35:44
Also in:
lkml
On Tue, Feb 27, 2018 at 09:53:49AM -0800, Eric Dumazet wrote:
On Mon, 2018-02-26 at 03:17 +0200, Michael S. Tsirkin wrote:quoted
So pointer rings work fine, but they have a problem: make them too small and not enough entries fit. Make them too large and you start flushing your cache and running out of memory. This is a new idea of mine: a ring backed by a linked list. Once you run out of ring entries, instead of a drop you fall back on a list with a common lock. Should work well for the case where the ring is typically sized correctly, but will help address the fact that some user try to set e.g. tx queue length to 1000000. In other words, the idea is that if a user sets a really huge TX queue length, we allocate a ptr_ring which is smaller, and use the backup linked list when necessary to provide the requested TX queue length legitimately. My hope this will move us closer to direction where e.g. fw codel can use ptr rings without locking at all. The API is still very rough, and I really need to take a hard look at lock nesting. Compiled only, sending for early feedback/flames.Okay I'll bite then ;)
Let me start by saying that there's no intent to merge this before any numbers show a performance gain.
High performance will be hit only if nothing is added in the (fallback) list. Under stress, list operations will be the bottleneck, allowing XXXX items in the list, probably wasting cpu caches by always dequeue-ing cold objects. Since systems need to be provisioned to cope with the stress, why trying to optimize the light load case, while we know CPU has plenty of cycles to use ?
E.g. with tun people configure huge rx rings to avoid packet drops, but in practice tens of packets is the maximum we see even under heavy load except <1% of time. So the list will get used a very small % of time and yes, that time it will be slower.
If something uses ptr_ring and needs a list for the fallback, it might simply go back to the old-and-simple list stuff.
So for size > 512 we use a list, for size < 512 we use a ptr ring? That is absolutely an option. My concern is that this means that simply by increasing the size using ethtool suddenly user sees a slowdown. This did not use to be the case so users might be confused.
Note that this old-and-simple stuff can greatly be optimized with the use of two lists, as was shown in UDP stack lately, to decouple producer and consumer (batching effects)
Pls note that such a batching is already built in to this patch: packets are added to the last skb, then dequeued as a batch and moved to consumer_list. -- MST