Re: [PATCH net] sctp: set newsk sk_socket before processing listening sk backlog

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: 2019-10-02 17:49:00
Also in: linux-sctp

On Wed, Oct 02, 2019 at 02:41:27PM -0300, Marcelo Ricardo Leitner wrote:

On Thu, Oct 03, 2019 at 01:26:46AM +0800, Xin Long wrote:

quoted

On Wed, Oct 2, 2019 at 8:55 PM Marcelo Ricardo Leitner
[off-list ref] wrote:

quoted

On Wed, Oct 02, 2019 at 04:23:52PM +0800, Xin Long wrote:

quoted

On Wed, Oct 2, 2019 at 9:04 AM Marcelo Ricardo Leitner
[off-list ref] wrote:

quoted

On Mon, Sep 30, 2019 at 09:10:18PM +0800, Xin Long wrote:

quoted

This patch is to fix a NULL-ptr deref crash in selinux_sctp_bind_connect:

  [...] kasan: GPF could be caused by NULL-ptr deref or user memory access
  [...] RIP: 0010:selinux_sctp_bind_connect+0x16a/0x230
  [...] Call Trace:
  [...]  security_sctp_bind_connect+0x58/0x90
  [...]  sctp_process_asconf+0xa52/0xfd0 [sctp]
  [...]  sctp_sf_do_asconf+0x782/0x980 [sctp]
  [...]  sctp_do_sm+0x139/0x520 [sctp]
  [...]  sctp_assoc_bh_rcv+0x284/0x5c0 [sctp]
  [...]  sctp_backlog_rcv+0x45f/0x880 [sctp]
  [...]  __release_sock+0x120/0x370
  [...]  release_sock+0x4f/0x180
  [...]  sctp_accept+0x3f9/0x5a0 [sctp]
  [...]  inet_accept+0xe7/0x6f0

It was caused by that the 'newsk' sk_socket was not set before going to
security sctp hook when doing accept() on a tcp-type socket:

  inet_accept()->
    sctp_accept():
      lock_sock():
          lock listening 'sk'
                                          do_softirq():
                                            sctp_rcv():  <-- [1]
                                                asconf chunk arrived and
                                                enqueued in 'sk' backlog
      sctp_sock_migrate():
          set asoc's sk to 'newsk'
      release_sock():
          sctp_backlog_rcv():
            lock 'newsk'
            sctp_process_asconf()  <-- [2]
            unlock 'newsk'
    sock_graft():
        set sk_socket  <-- [3]

As it shows, at [1] the asconf chunk would be put into the listening 'sk'
backlog, as accept() was holding its sock lock. Then at [2] asconf would
get processed with 'newsk' as asoc's sk had been set to 'newsk'. However,
'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect()
would deref it, then kernel crashed.

Note that sctp will migrate such incoming chunks from sk to newsk in
sctp_rcv() if they arrived after the mass-migration performed at
sctp_sock_migrate().

That said, did you explore changing inet_accept() so that
sk1->sk_prot->accept() would return sk2 still/already locked?
That would be enough to block [2] from happening as then it would be
queued on newsk backlog this time and avoid nearly duplicating
inet_accept(). (too bad for this chunk, hit 2 backlogs..)

We don't have to bother inet_accept() for it. I had this one below,
and I was just thinking the locks order doesn't look nice. Do you
think this is more acceptable?

@@ -4963,15 +4963,19 @@ static struct sock *sctp_accept(struct sock

*sk, int flags, int *err, bool kern)
         * asoc to the newsk.
         */
        error = sctp_sock_migrate(sk, newsk, asoc, SCTP_SOCKET_TCP);
-       if (error) {
-               sk_common_release(newsk);
-               newsk = NULL;
+       if (!error) {
+               lock_sock_nested(newsk, SINGLE_DEPTH_NESTING);
+               release_sock(sk);

Interesting. It fixes the backlog processing, ok. Question:

quoted

+               release_sock(newsk);

As newsk is hashed already and unlocked here to be locked again later
on inet_accept(), it could receive a packet in between (thus before
sock_graft() could have a chance to run), no?

You're right, it explains another call trace happened once in our testing.

The way to changing inet_accept() will also have to change all protocols'
.accept(). Given that this issue is only triggered in a very small moment,
can we just silently discard this asconf chunk if sk->sk_socket is NULL?
and let peer's T4-timer retransmit it.

No no. If the change doesn't hurt other protocols, we should try that
first.  Otherwise this adds overhead to the network and we could get a
bug report soon on "valid asconf being ignored".

If that doesn't pan out, maybe your initial suggestion is the way out.
More custom code but keeps the expected behavior.

quoted

@@ -3709,6 +3709,9 @@ enum sctp_disposition sctp_sf_do_asconf(struct net *net,
        struct sctp_addiphdr *hdr;
        __u32 serial;

+       if (asoc->base.sk->sk_socket)
+               return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);

What if we add this to sctp_backlog_rcv() instead?  As in, do not
process the backlog if so.
And force doing backlog on sctp_rcv() also.
As we are sure that there will be a subsequent lock/unlock and that it
will handle it, this could work.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help