Re: KASAN: use-after-free Read in __lock_sock

From: Lee Jones <hidden>
Date: 2021-12-06 19:21:11
Also in: linux-sctp, lkml

Networking Gurus,

On Thu, 22 Nov 2018, Marcelo Ricardo Leitner wrote:

On Thu, Nov 22, 2018 at 10:44:16PM +0900, Xin Long wrote:

quoted

On Thu, Nov 22, 2018 at 10:13 PM Marcelo Ricardo Leitner
[off-list ref] wrote:

quoted

On Mon, Nov 19, 2018 at 05:57:33PM +0900, Xin Long wrote:

quoted

On Sat, Nov 17, 2018 at 4:18 PM syzbot
[off-list ref] wrote:

quoted

Hello,

syzbot found the following crash on:

HEAD commit:    ccda4af0f4b9 Linux 4.20-rc2
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x6cd533400000
kernel config:  https://syzkaller.appspot.com/x/.config?xJ0a89f12ca9b0f5
dashboard link: https://syzkaller.appspot.com/bug?extid’76d76e83e3bcde6c99
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com

netlink: 5 bytes leftover after parsing attributes in process
`syz-executor5'.
=================================
BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
kernel/locking/lockdep.c:3218
Read of size 8 at addr ffff8881d26d60e0 by task syz-executor1/13725

CPU: 0 PID: 13725 Comm: syz-executor1 Not tainted 4.20.0-rc2+ #333
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
  lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
  spin_lock_bh include/linux/spinlock.h:334 [inline]
  __lock_sock+0x203/0x350 net/core/sock.c:2253
  lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
  lock_sock include/net/sock.h:1492 [inline]
  sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324

static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
{
        struct sctp_endpoint *ep = tsp->asoc->ep;
        struct sctp_comm_param *commp = p;
        struct sock *sk = ep->base.sk; <--- [1]
...
        int err = 0;

        lock_sock(sk);  <--- [2]

Between [1] and [2], an asoc peeloff may happen, still thinking
how to avoid this.

This race cannot happen more than once for an asoc, so something
like this may be doable:

        struct sctp_comm_param *commp = p;
        struct sctp_endpoint *ep;
        struct sock *sk;
...
        int err = 0;

again:
        ep = tsp->asoc->ep;
        sk = ep->base.sk; <---[3]
        lock_sock(sk);  <--- [2]

if peel-off happens between [3] and [2], and sk is freed
somewhere, it will panic on [2] when trying to get the
sk->lock, no?

Not sure what protects it, but this construct is also used in BH processing at
sctp_rcv():
...
        bh_lock_sock(sk); [4]

        if (sk != rcvr->sk) {
                /* Our cached sk is different from the rcvr->sk.  This is
                 * because migrate()/accept() may have moved the association
                 * to a new socket and released all the sockets.  So now we
                 * are holding a lock on the old socket while the user may
                 * be doing something with the new socket.  Switch our veiw
                 * of the current sk.
                 */
                bh_unlock_sock(sk);
                sk = rcvr->sk;
                bh_lock_sock(sk);
        }
...

If it is not safe, then we have an issue there too.
And by [4] that copy on sk is pretty old already.

quoted

        if (sk != tsp->asoc->ep->base.sk) {
                /* Asoc was peeloff'd */
                unlock_sock(sk);
                goto again;
        }

Similarly to what we did on cea0cc80a677 ("sctp: use the right sk
after waking up from wait_buf sleep").

I'm currently debugging something similar (the same perhaps) on an
older Stable kernel.  However the same Repro which was found and
authored for a production level mobile phone, doesn't seem to trigger
on an x86_64 qemu instance running Mainline.

That's not to say that Mainline isn't still suffering with this issue
though.  It probably has more to do with the specificity of the Repro
which was designed specifically to trigger on the target device.

Does anyone know if this particular issue was ever patched?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help