Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer <hidden>
Date: 2012-12-06 12:26:46
On Wed, 2012-12-05 at 10:24 +0100, Jesper Dangaard Brouer wrote:
The previous evictor patch of letting new fragments enter, worked amazingly well. But I suspect, this might also be related to a bug/problem in the evictor loop (which were being hidden by that patch).
The evictor loop does not contain a bug, just a SMP scalability issue (which is fixed by later patches). The first evictor patch, which does not let new fragments enter, only worked amazingly well because its hiding this (and other) scalability issues, and implicit allowing frags already "in" to exceed the mem usage for 1 jiffie. Thus, invalidating the patch, as the improvement were only a side effect.
My new *theory* is that the evictor loop, will be looping too much, if it finds a fragment which is INET_FRAG_COMPLETE ... in that case, we don't advance the LRU list, and thus will pickup the exact same inet_frag_queue again in the loop... to get out of the loop we need another CPU or packet to change the LRU list for us... I'll test that theory... (its could also be CPUs fighting over the same LRU head element that cause this) ... more to come...
The above theory does happen, but does not cause excessive looping. The CPUs are just fighting about who gets to free the inet_frag_queue and who gets to unlink it from its data structures (I guess, resulting cache bouncing between CPUs). CPUs are fighting for the same LRU head (inet_frag_queue) element, which is bad for scalability. We could fix this by unlinking the element once a CPU graps it, but it would require us to change a read_lock to a write_lock, thus we might not gain much performance. I already (implicit) fix this is a later patch, where I'm moving the LRU lists to be per CPU. So, I don't know if it's worth fixing. (And yes, I'm using thresh 4Mb/3Mb as my default setting now, but I'm also experimenting with other thresh sizes) p.s. Thank you Eric for being so persistent, so I realized this patch were not good. We can hopefully now, move on to the other patches, which fixes the real scalability issues. --Jesper