Re: [PATCH net] sctp: set newsk sk_socket before processing listening sk backlog
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: 2019-10-02 17:49:00
Also in:
linux-sctp
On Wed, Oct 02, 2019 at 02:41:27PM -0300, Marcelo Ricardo Leitner wrote:
On Thu, Oct 03, 2019 at 01:26:46AM +0800, Xin Long wrote:quoted
On Wed, Oct 2, 2019 at 8:55 PM Marcelo Ricardo Leitner [off-list ref] wrote:quoted
On Wed, Oct 02, 2019 at 04:23:52PM +0800, Xin Long wrote:quoted
On Wed, Oct 2, 2019 at 9:04 AM Marcelo Ricardo Leitner [off-list ref] wrote:quoted
On Mon, Sep 30, 2019 at 09:10:18PM +0800, Xin Long wrote:quoted
This patch is to fix a NULL-ptr deref crash in selinux_sctp_bind_connect: [...] kasan: GPF could be caused by NULL-ptr deref or user memory access [...] RIP: 0010:selinux_sctp_bind_connect+0x16a/0x230 [...] Call Trace: [...] security_sctp_bind_connect+0x58/0x90 [...] sctp_process_asconf+0xa52/0xfd0 [sctp] [...] sctp_sf_do_asconf+0x782/0x980 [sctp] [...] sctp_do_sm+0x139/0x520 [sctp] [...] sctp_assoc_bh_rcv+0x284/0x5c0 [sctp] [...] sctp_backlog_rcv+0x45f/0x880 [sctp] [...] __release_sock+0x120/0x370 [...] release_sock+0x4f/0x180 [...] sctp_accept+0x3f9/0x5a0 [sctp] [...] inet_accept+0xe7/0x6f0 It was caused by that the 'newsk' sk_socket was not set before going to security sctp hook when doing accept() on a tcp-type socket: inet_accept()-> sctp_accept(): lock_sock(): lock listening 'sk' do_softirq(): sctp_rcv(): <-- [1] asconf chunk arrived and enqueued in 'sk' backlog sctp_sock_migrate(): set asoc's sk to 'newsk' release_sock(): sctp_backlog_rcv(): lock 'newsk' sctp_process_asconf() <-- [2] unlock 'newsk' sock_graft(): set sk_socket <-- [3] As it shows, at [1] the asconf chunk would be put into the listening 'sk' backlog, as accept() was holding its sock lock. Then at [2] asconf would get processed with 'newsk' as asoc's sk had been set to 'newsk'. However, 'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect() would deref it, then kernel crashed.Note that sctp will migrate such incoming chunks from sk to newsk in sctp_rcv() if they arrived after the mass-migration performed at sctp_sock_migrate(). That said, did you explore changing inet_accept() so that sk1->sk_prot->accept() would return sk2 still/already locked? That would be enough to block [2] from happening as then it would be queued on newsk backlog this time and avoid nearly duplicating inet_accept(). (too bad for this chunk, hit 2 backlogs..)We don't have to bother inet_accept() for it. I had this one below, and I was just thinking the locks order doesn't look nice. Do you think this is more acceptable?@@ -4963,15 +4963,19 @@ static struct sock *sctp_accept(struct sock*sk, int flags, int *err, bool kern) * asoc to the newsk. */ error = sctp_sock_migrate(sk, newsk, asoc, SCTP_SOCKET_TCP); - if (error) { - sk_common_release(newsk); - newsk = NULL; + if (!error) { + lock_sock_nested(newsk, SINGLE_DEPTH_NESTING); + release_sock(sk);Interesting. It fixes the backlog processing, ok. Question:quoted
+ release_sock(newsk);As newsk is hashed already and unlocked here to be locked again later on inet_accept(), it could receive a packet in between (thus before sock_graft() could have a chance to run), no?You're right, it explains another call trace happened once in our testing. The way to changing inet_accept() will also have to change all protocols' .accept(). Given that this issue is only triggered in a very small moment, can we just silently discard this asconf chunk if sk->sk_socket is NULL? and let peer's T4-timer retransmit it.No no. If the change doesn't hurt other protocols, we should try that first. Otherwise this adds overhead to the network and we could get a bug report soon on "valid asconf being ignored". If that doesn't pan out, maybe your initial suggestion is the way out. More custom code but keeps the expected behavior.quoted
@@ -3709,6 +3709,9 @@ enum sctp_disposition sctp_sf_do_asconf(struct net *net, struct sctp_addiphdr *hdr; __u32 serial; + if (asoc->base.sk->sk_socket) + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
What if we add this to sctp_backlog_rcv() instead? As in, do not process the backlog if so. And force doing backlog on sctp_rcv() also. As we are sure that there will be a subsequent lock/unlock and that it will handle it, this could work.