Re: [Patch net-next] fib: move fib_rules_cleanup_ops() under rtnl lock
From: Cong Wang <hidden>
Date: 2015-03-27 19:25:29
On Fri, Mar 27, 2015 at 5:01 AM, Thomas Graf [off-list ref] wrote:
On 03/26/15 at 04:47pm, Alexander Duyck wrote:quoted
On 03/26/2015 04:05 PM, Cong Wang wrote:quoted
On Thu, Mar 26, 2015 at 3:32 PM, Cong Wang [off-list ref] wrote:quoted
On the other hand, the name rules_mod_lock already tells it is just a protection for ops (module) register.I even doubt we really need rules_mod_lock, it is per netns, which is newly allocated when registering pernet and upper layer should guarantee no concurrent unregistering, so we probably only need to take rtnl lock.I'm adding Thomas as he was the original author for the code and might have a better idea of what needs to be rtnl locked and what doesn't. You should probably CC him as well on the v2 patch. As far as why I am so focused on moving fib4_rules_exit it is because we don't want to call that delete function until after the table has been cleared. Otherwise you end up triggering the external_flush and unmerge code on a full table instead of an empty one. The result is you end up allocating a bunch of memory before you then turn around and free it. So even if you retain the rtnl_lock changes it would still be best to move fib4_rules_exit call to the region after you have freed the FIB tables, but before you free fib_table_hash.I agree with Alex. Reordering fib4_rules_exit() makes sense. Not only to fix this issue but also for the purpose of correct ordering of allocation and releasing.
I am surprised you guys only care this one issue, there are more for me. $subject already says this is net-next, nothing needs to worry about backport.
It is definitely also safe to move fib_rules_cleanup_ops() out of rules_mod_lock. It is the responsibility of whoever registers the rules that no rule is in use he calls fib_rules_unregister(). I don't see why fib_rules should hold rtnl_lock upon delete for the caller. If the caller requires this protection it's up to him to provide it.
If not rtnl lock, then what prevents the race between netns unregistering
with rule add/del via netlink?
I know ops is removed from the list at that point, but ops->rules might be
still being traversed under rtnl lock:
ops = lookup_rules_ops();
list_del_rcu(&ops->list);
list_for_each_entry(ops->rules) {
fib_rules_cleanup_ops(ops);