Thread (18 messages) 18 messages, 5 authors, 2019-04-04

Re: [RFC 4/4] net/ipv4/fib: Don't synchronise_rcu() every 512Kb

From: Dmitry Safonov <hidden>
Date: 2019-03-26 23:14:52
Also in: lkml

On 3/26/19 3:39 PM, David Ahern wrote:
On 3/26/19 9:30 AM, Dmitry Safonov wrote:
quoted
Fib trie has a hard-coded sync_pages limit to call synchronise_rcu().
The limit is 128 pages or 512Kb (considering common case with 4Kb
pages).

Unfortunately, at Arista we have use-scenarios with full view software
forwarding. At the scale of 100K and more routes even on 2 core boxes
the hard-coded limit starts actively shooting in the leg: lockup
detector notices that rtnl_lock is held for seconds.
First reason is previously broken MAX_WORK, that didn't limit pending
balancing work. While fixing it, I've noticed that the bottle-neck is
actually in the number of synchronise_rcu() calls.

I've tried to fix it with a patch to decrement number of tnodes in rcu
callback, but it hasn't much affected performance.

One possible way to "fix" it - provide another sysctl to control
sync_pages, but in my POV it's nasty - exposing another realisation
detail into user-space.
well, that was accepted last week. ;-)

commit 9ab948a91b2c2abc8e82845c0e61f4b1683e3a4f
Author: David Ahern [off-list ref]
Date:   Wed Mar 20 09:18:59 2019 -0700

    ipv4: Allow amount of dirty memory from fib resizing to be controllable


Can you see how that change (should backport easily) affects your test
case? From my perspective 16MB was the sweet spot.
FWIW, I would like to +Cc Paul here.

TLDR; we're looking with David into ways to improve a hardcoded limit
tnode_free_size at net/ipv4/fib_trie.c: currently it's way too low
(512Kb). David created a patch to provide sysctl that controls the limit
and it would solve a problem for both of us. In parallel, I thought that
exposing this to userspace is not much fun and added a shrinker with
synchronize_rcu(). I'm not any sure that the latter is actually a sane
solution..
Is there any guarantee that memory to-be freed by call_rcu() will get
freed in OOM conditions? Might there be a chance that we don't need any
limit here at all?

Worth to mention that I don't argue David's patch as I pointed that it
would (will) solve the problem for us both, but with good intentions
wondering if we can do something here rather a new sysctl knob.

Thanks,
          Dmitry
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help