Re: [PATCH net-next v2] ipv4: fib: Replay events when registering FIB notifier
From: David Miller <davem@davemloft.net>
Date: 2016-11-01 15:36:53
From: Roopa Prabhu <redacted> Date: Tue, 01 Nov 2016 08:14:14 -0700
On 11/1/16, 7:19 AM, Eric Dumazet wrote:quoted
On Tue, 2016-11-01 at 00:57 +0200, Ido Schimmel wrote:quoted
On Mon, Oct 31, 2016 at 02:24:06PM -0700, Eric Dumazet wrote:quoted
How well will this work for large FIB tables ? Holding rtnl while sending thousands of skb will prevent consumers to make progress ?Can you please clarify what do you mean by "while sending thousands of skb"? This patch doesn't generate notifications to user space, but instead invokes notification routines inside the kernel. I probably misunderstood you. Are you suggesting this be done using RCU instead? Well, there are a couple of reasons why I took RTNL here:No, I do not believe RCU is wanted here, in control path where we might sleep anyway.quoted
1) The FIB notification chain is blocking, so listeners are expected to be able to sleep. This isn't possible if we use RCU. Note that this chain is mainly useful for drivers that reflect the FIB table into a capable device and hardware operations usually involve sleeping. 2) The insertion of a single route is done with RTNL held. I didn't want to differentiate between both cases. This property is really useful for listeners, as they don't need to worry about locking in writer-side. Access to data structs is serialized by RTNL.My concern was that for large iterations, you might hold RTNL and/or current cpu for hundred of ms or even seconds...I have the same concern as Eric here. I understand why you need it, but can the driver request for an initial dump and that dump be made more efficient somehow ie not hold rtnl for the whole dump ?. instead of making the fib notifier registration code doing it. these routing table sizes can be huge and an analogy for this in user-space: We do request a netlink dump of routing tables at initialization (on driver starts or resets)... but, existing netlink routing table dumps for that scale don't hold rtnl for the whole dump. The dump is split into multiple responses to the user and hence it does not starve other rtnl users. In-fact I don't think netlink routing table dumps from user-space hold rtnl_lock for the whole dump. IIRC this was done to allow route add/dels to be allowed in parallel for performance reasons. (I will need to double check to confirm this).
I've always had some reservations about using notifiers for getting the FIB entries down to the offloaded device. And this problem is just another symptom that it is the wrong mechanism for propagating this information. As suggested by Roopa here, perhaps we're looking at the problem from the wrong direction. We tend to design NDO ops and notifiers, to "push" things to the driver, but maybe something more like a push+pull model is better.