Re: [PATCH v3 1/2] sctp: rcu-ify addr_waitq
From: Neil Horman <nhorman@tuxdriver.com>
Date: 2015-06-10 19:14:37
Also in:
linux-sctp
On Wed, Jun 10, 2015 at 10:31:42AM -0300, Marcelo Ricardo Leitner wrote:
On Tue, Jun 09, 2015 at 04:32:59PM -0300, Marcelo Ricardo Leitner wrote:quoted
On Tue, Jun 09, 2015 at 07:36:38AM -0400, Neil Horman wrote:quoted
On Mon, Jun 08, 2015 at 05:37:05PM +0200, Hannes Frederic Sowa wrote:quoted
On Mo, 2015-06-08 at 11:19 -0400, Neil Horman wrote:quoted
On Mon, Jun 08, 2015 at 04:59:18PM +0200, Hannes Frederic Sowa wrote:quoted
On Mon, Jun 8, 2015, at 16:46, Hannes Frederic Sowa wrote:quoted
Hi Marcelo, a few hints on rcuification, sorry I reviewed the code so late: On Fri, Jun 5, 2015, at 19:08, mleitner@redhat.com wrote:quoted
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> That's needed for the next patch, so we break the lock inversion between netns_sctp->addr_wq_lock and socket lock on sctp_addr_wq_timeout_handler(). With this, we can traverse addr_waitq without taking addr_wq_lock, taking it just for the write operations. Signed-off-by: Marcelo Ricardo Leitner < marcelo.leitner@gmail.com> --- Notes: v2->v3: placed break statement on sctp_free_addr_wq_entry() removed unnecessary spin_lock noticed by Neil include/net/netns/sctp.h | 2 +- net/sctp/protocol.c | 80 +++++++++++++++++++++++++++++------------------- 2 files changed, 49 insertions(+), 33 deletions(-)diff --git a/include/net/netns/sctp.hb/include/net/netns/sctp.h index 3573a81815ad9e0efb6ceb721eb066d3726419f0..9e53412c4ed829e8e4577 7a6d95406d490dbaa75 100644--- a/include/net/netns/sctp.h +++ b/include/net/netns/sctp.h@@ -28,7 +28,7 @@ struct netns_sctp { * It is a list of sctp_sockaddr_entry. */ struct list_head local_addr_list; - struct list_head addr_waitq; + struct list_head __rcu addr_waitq; struct timer_list addr_wq_timer; struct list_head auto_asconf_splist; spinlock_t addr_wq_lock;diff --git a/net/sctp/protocol.c b/net/sctp/protocol.cindex 53b7acde9aa37bf3d4029c459421564d5270f4c0..9954fb8c9a9455d5ad7a6 27e2d7f9a1fef861fc2 100644--- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c@@ -593,15 +593,47 @@ static void sctp_v4_ecn_capable(structsock *sk) INET_ECN_xmit(sk); } +static void sctp_free_addr_wq(struct net *net) +{ + struct sctp_sockaddr_entry *addrw; + + spin_lock_bh(&net->sctp.addr_wq_lock);Instead of holding spin_lock_bh you need to hold rcu_read_lock_bh, so kfree_rcu does not call free function at once (in theory ;) ).quoted
+ del_timer(&net->sctp.addr_wq_timer); + list_for_each_entry_rcu(addrw, &net->sctp.addr_waitq, list) { + list_del_rcu(&addrw->list); + kfree_rcu(addrw, rcu); + } + spin_unlock_bh(&net->sctp.addr_wq_lock); +} + +/* As there is no refcnt on sctp_sockaddr_entry, we must check inside + * the lock if it wasn't removed from addr_waitq already, otherwise we + * could double-free it. + */ +static void sctp_free_addr_wq_entry(struct net *net, + struct sctp_sockaddr_entry *addrw) +{ + struct sctp_sockaddr_entry *temp; + + spin_lock_bh(&net->sctp.addr_wq_lock);I don't think this spin_lock operation is needed. The del_timer functions do synchronize themselves.Sorry, those above two locks are needed, they are not implied by other locks.What makes you say that? Multiple contexts can issue mod_timer calls on the same timer safely no, because of the internal locking?That's true for timer handling but not to protect net->sctp.addr_waitq list (Marcelo just explained it to me off-list). Looking at the patch only in patchworks lost quite a lot of context you were already discussing. ;)I can imagine :)quoted
We are currently checking if the double iteration can be avoided by splicing addr_waitq on the local stack while holding the spin_lock and later on notifying the sockets.As we discussed, this I think would make a good alternate approach.I was experimenting on this but this would introduce another complex logic instead, as not all elements are pruned from net->sctp.addr_waitq at sctp_addr_wq_timeout_handler(), mainly ipv6 addresses in DAD state (which I believe that break statement is misplaced and should be a continue instead, I'll check on this later) That means we would have to do the splice, process the loop, merge the remaining elements with the new net->sctp.addr_waitq that was possibly was built meanwhile and then squash oppositve events (logic currently in sctp_addr_wq_mgmt() ), otherwise we could be issuing spurious events. But it will probably do more harm than good as the double search will usually hit the first list element on this 2nd search, unless the element we are trying to remove was already removed from it (which is rare, it's when user add and remove addresses too fast) or some other address was skipped (DAD addresses).Better thinking.. actually it may be the way to go. If we rcu-cify addr_waitq like that and if the user manage to add an address and remove it while the timeout handler is running, the system may emit just the address add and not the remove, while if we splice the list, this won't happen. Marcelo
Thats a good point. Seems like a list splice makes more sense here. Neil