Re: ata: BUG in ata_sff_hsm_move
From: Dmitry Vyukov <dvyukov@google.com>
Date: 2016-01-29 13:40:57
Also in:
lkml
On Fri, Jan 29, 2016 at 2:18 PM, Dmitry Vyukov [off-list ref] wrote:
On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo [off-list ref] wrote:quoted
Hello, Dmitry. On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote:quoted
quoted
Hmmm... the port interrupt handler checks for IDLE before calling into hsm_move, so the only explanation would be that something is resetting it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change race between ISR and PIO") describes and fixes the same problem. The fix seems correct and I can't find anywhere else where this can happen. :( Can you please post the kernel log leading to the BUG? Also, I don't think that condition needs to be BUG. I'll change it to WARN.Here are two logs, in both cases no kernel messages prior to the bug: https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txtlol, this is kinda embarrassing. It looks like the poll path wasn't doing any locking. Can you please verify the following patch at least doesn't crash the machine immediately and if so keep it applied to the test kernel so that we can verify that the problem actually goes away?Great that you managed to debug it without a repro! I've applied this patch to my tree and will rerun fuzzer. I will notify you if I see this warning again. Thanks
It now deadlocks at this stack. It is probably not OK to call ata_sff_hsm_move holding the spinlock. [ 8.168809] NMI backtrace for cpu 2 [ 8.168809] CPU: 2 PID: 17 Comm: kworker/2:0 Not tainted 4.5.0-rc1+ #302 [ 8.168809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 8.168809] Workqueue: ata_sff ata_sff_pio_task [ 8.168809] task: ffff88006d24c740 ti: ffff88006d090000 task.ti: ffff88006d090000 [ 8.168809] RIP: 0010:[<ffffffff82c0e388>] [<ffffffff82c0e388>] delay_tsc+0x18/0x70 [ 8.168809] RSP: 0000:ffff88006d097a60 EFLAGS: 00000086 [ 8.168809] RAX: 000000008edaeede RBX: ffff880036e88028 RCX: 000000000000001e [ 8.168809] RDX: 0000000000000033 RSI: 000000338edaee9c RDI: 0000000000000001 [ 8.168809] RBP: ffff88006d097a60 R08: 0000000000000002 R09: 0000000000000000 [ 8.168809] R10: fffffbfff11baf82 R11: 1ffffffff134edc5 R12: ffff880036e88038 [ 8.168809] R13: 000000009a9d2d40 R14: ffff880036e88030 R15: 0000000099383d98 [ 8.168809] FS: 0000000000000000(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 [ 8.168809] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8.168809] CR2: 00000000ffffffff CR3: 0000000007a4b000 CR4: 00000000000006e0 [ 8.168809] Stack: [ 8.168809] ffff88006d097a70 ffffffff82c0e2da ffff88006d097aa8 ffffffff814661a9 [ 8.168809] ffff880036e88028 0000000000000082 ffff880036e90008 ffff880036e90010 [ 8.168809] 0000000000000001 ffff88006d097ad0 ffffffff86652c47 ffffffff83a80c4c [ 8.168809] Call Trace: [ 8.168809] [<ffffffff82c0e2da>] __delay+0xa/0x10 [ 8.168809] [<ffffffff814661a9>] do_raw_spin_lock+0x149/0x2b0 [ 8.168809] [<ffffffff86652c47>] _raw_spin_lock_irqsave+0xa7/0xd0 [ 8.168809] [<ffffffff83a80c4c>] ? ata_hsm_qc_complete+0x12c/0x420 [ 8.168809] [<ffffffff83a80c4c>] ata_hsm_qc_complete+0x12c/0x420 [ 8.168809] [<ffffffff83a82f42>] ata_sff_hsm_move+0x1e2/0x1c60 [ 8.168809] [<ffffffff83a85f17>] ? ata_sff_pio_task+0x87/0x530 [ 8.168809] [<ffffffff83a86296>] ata_sff_pio_task+0x406/0x530 [ 8.168809] [<ffffffff813a2386>] process_one_work+0x796/0x1440 [ 8.168809] [<ffffffff813a22ba>] ? process_one_work+0x6ca/0x1440 [ 8.168809] [<ffffffff813d6190>] ? finish_task_switch+0x120/0x5f0 [ 8.168809] [<ffffffff813a1bf0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0 [ 8.168809] [<ffffffff813a310b>] worker_thread+0xdb/0xfc0 [ 8.168809] [<ffffffff813b637f>] kthread+0x23f/0x2d0 [ 8.168809] [<ffffffff813a3030>] ? process_one_work+0x1440/0x1440 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 [ 8.168809] [<ffffffff8665362f>] ret_from_fork+0x3f/0x70 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
quoted
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 608677d..6991efc 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c@@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work) u8 status; int poll_next; + spin_lock_irq(ap->lock); + BUG_ON(ap->sff_pio_task_link == NULL); /* qc can be NULL if timeout occurred */ qc = ata_qc_from_tag(ap, link->active_tag); if (!qc) { ap->sff_pio_task_link = NULL; - return; + goto out_unlock; } fsm_start:@@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work) */ status = ata_sff_busy_wait(ap, ATA_BUSY, 5); if (status & ATA_BUSY) { + spin_unlock_irq(ap->lock); ata_msleep(ap, 2); + spin_lock_irq(ap->lock); + status = ata_sff_busy_wait(ap, ATA_BUSY, 10); if (status & ATA_BUSY) { ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); - return; + goto out_unlock; } }@@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work) */ if (poll_next) goto fsm_start; +out_unlock: + spin_unlock_irq(ap->lock); } /** --You received this message because you are subscribed to the Google Groups "syzkaller" group. To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.