Thread (3 messages) 3 messages, 2 authors, 2024-06-07

Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a previous delay

From: Petr Pavlu <petr.pavlu@suse.com>
Date: 2024-06-07 08:33:36
Also in: lkml

On 6/4/24 10:30, Paolo Abeni wrote:
On Fri, 2024-05-31 at 10:53 +0200, Petr Pavlu wrote:
quoted
[Added back netdev@vger.kernel.org and linux-kernel@vger.kernel.org
which seem to be dropped by accident.]

On 5/30/24 17:59, Kuifeng Lee wrote:
quoted
On Wed, May 29, 2024 at 6:53 AM Petr Pavlu [off-list ref] wrote:
quoted
The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.

A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.
If the problem we are trying to fix is using the old value, should we move
the line reading the value to a place after updating it instead of a
local copy of
the whole ctl_table?
Just moving the read of net->ipv6.sysctl.flush_delay after the
proc_dointvec() call was actually my initial implementation. I then
opted for the proposed version because it looked useful to me to save
memory used to store net->ipv6.sysctl.flush_delay.
Note that due to alignment, the struct netns_sysctl_ipv6 size is not
going to change on 64 bits build.

And if the layout would change, that could have subtle performance side
effects (moving later fields in netns_sysctl_ipv6 in different
cachelines) that we want to avoid for a net patch.
quoted
Another minor aspect is that these sysctl writes are not serialized. Two
invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
same time. It can then happen that they both first execute
proc_dointvec(). One of them ends up slower and thus its value gets
stored in net->ipv6.sysctl.flush_delay. Both runs then return to
ipv6_sysctl_rtcache_flush(), read the stored value and execute
fib6_run_gc(). It means one of them calls this function with a value
different that it was actually given on input. By having a purely local
variable, each write is independent and fib6_run_gc() is executed with
the right input delay.

The cost of making a copy of ctl_table is a few instructions and this
isn't on any hot path. The same pattern is used, for example, in
net/ipv6/addrconf.c, function addrconf_sysctl_forward().

So overall, the proposed version looked marginally better to me than
just moving the read of net->ipv6.sysctl.flush_delay later in
ipv6_sysctl_rtcache_flush().
All in all the increased complexity vs the simple solution does not
look worth to me.

Please revert to the initial/simpler implementation for this fix,
thanks!
Fair enough, I'll post v2 with the initial/simpler version.

Thanks,
Petr
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help