Re: v4.20-rc6: Sporadic use-after-free in bt_iter()

From: Jens Axboe <axboe@kernel.dk>
Date: 2018-12-20 04:19:46

On 12/19/18 8:24 PM, jianchao.wang wrote:


On 12/20/18 11:17 AM, Jens Axboe wrote:

quoted

On 12/19/18 5:16 PM, Bart Van Assche wrote:

quoted

On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote:

quoted

On 12/19/18 4:24 PM, Bart Van Assche wrote:

quoted

Hello,

If I run the srp blktests in a loop then I see the below call stack appearing
sporadically. I have not yet had the time to analyze this but I'm reporting
this here in case someone else would already have had a look at this.

Bart.

==================================================================
BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
Read of size 8 at addr ffff88803b335240 by task fio/21412

CPU: 0 PID: 21412 Comm: fio Tainted: G        W         4.20.0-rc6-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 print_address_description+0x71/0x239
 kasan_report.cold.5+0x242/0x301
 __asan_load8+0x54/0x90
 bt_iter+0x86/0xf0
 blk_mq_queue_tag_busy_iter+0x373/0x5e0
 blk_mq_in_flight+0x96/0xb0
 part_in_flight+0x40/0x140
 part_round_stats+0x18e/0x370
 blk_account_io_start+0x3d7/0x670
 blk_mq_bio_to_request+0x19c/0x3a0
 blk_mq_make_request+0x7a9/0xcb0
 generic_make_request+0x41d/0x960
 submit_bio+0x9b/0x250
 do_blockdev_direct_IO+0x435c/0x4c70
 __blockdev_direct_IO+0x79/0x88
 ext4_direct_IO+0x46c/0xc00
 generic_file_direct_write+0x119/0x210
 __generic_file_write_iter+0x11c/0x280
 ext4_file_write_iter+0x1b8/0x6f0
 aio_write+0x204/0x310
 io_submit_one+0x9d3/0xe80
 __x64_sys_io_submit+0x115/0x340
 do_syscall_64+0x71/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f02cf043219

I've seen this one before as well, it's not a new thing. As far as I can
tell, it's a false positive. There should be no possibility for a
use-after-free iterating the static tags/requests.

Are you sure this is a false positive?

No I'm not, but the few times I have seen it, I haven't been able to
make much sense of it. It goes back quite a bit.

I have not yet encountered any false

quoted

positive KASAN complaints. According to the following gdb output this complaint
refers to reading rq->q:

(gdb) list *(bt_iter+0x86)
0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237).
232
233             /*
234              * We can hit rq == NULL here, because the tagging functions
235              * test and set the bit before assigning ->rqs[].
236              */
237             if (rq && rq->q == hctx->queue)
238                     iter_data->fn(hctx, rq, iter_data->data, reserved);
239             return true;
240     }
241

From the disassembly output:

232
233             /*
234              * We can hit rq == NULL here, because the tagging functions
235              * test and set the bit before assigning ->rqs[].
236              */
237             if (rq && rq->q == hctx->queue)
   0xffffffff816b9339 <+121>:   test   %r12,%r12
   0xffffffff816b933c <+124>:   je     0xffffffff816b935f <bt_iter+159>
   0xffffffff816b933e <+126>:   mov    %r12,%rdi
   0xffffffff816b9341 <+129>:   callq  0xffffffff813bd3e0 <__asan_load8>
   0xffffffff816b9346 <+134>:   lea    0x138(%r13),%rdi
   0xffffffff816b934d <+141>:   mov    (%r12),%r14
   0xffffffff816b9351 <+145>:   callq  0xffffffff813bd3e0 <__asan_load8>
   0xffffffff816b9356 <+150>:   cmp    0x138(%r13),%r14
   0xffffffff816b935d <+157>:   je     0xffffffff816b936f <bt_iter+175>

BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also
refer to hctx->fq.flush_rq.

But even those are persistent for the lifetime of the queue... But since
kasan complains it belongs to a specific page, I'm guessing it's one
of the regular requests since those are out of a chopped up page. Which
means it makes even less sense.

Is this happening while devices are being actively torn down? And
are you using shared tags? That's the only way I could see this
triggering.

Or could it be caused by the stale request in hctx->tags->rqs[] slot ?
We don't clear it after free the requests.

And there could be a scenario like,
There used to be a io scheduler attached.
After some workload, the io scheduler is detached.
So there could be rqs allocated by the io scheduler left in hctx->tags->rqs.

blk_mq_get_request                            blk_mq_queue_tag_busy_iter
  -> blk_mq_get_tag
                                                -> bt_for_each
                                                  -> bt_iter
                                                    -> rq = taags->rqs[]
                                                    -> rq->q
  -> blk_mq_rq_ctx_init
    -> data->hctx->tags->rqs[rq->tag] = rq;

If the scenario is possible, maybe we could fix it as following.

Ah yes, good point, I bet that's what it is. But we just had this exact
discussion in another thread, and my point there was that we should
clear these when they go away, not inline. So how about clearing entries
when the sched tags go away?

-- 
Jens Axboe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help