Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
From: Paolo Valente <hidden>
Date: 2018-01-09 11:29:02
Il giorno 21 dic 2017, alle ore 10:14, Guoqing Jiang =
[off-list ref] ha scritto:
=20 =20 =20 On 12/21/2017 03:53 PM, Paolo Valente wrote:quoted
=20quoted
Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang =
[off-list ref] ha scritto:
quoted
quoted
=20 Hi, =20 =20 On 12/08/2017 08:34 AM, Holger Hoffst=C3=A4tte wrote:quoted
So plugging in a device on USB with BFQ as scheduler now works =
without
quoted
quoted
quoted
hiccup (probably thanks to Ming Lei's last patch), but of course I =
found
quoted
quoted
quoted
another problem. Unmounting the device after use, changing the =
scheduler
quoted
quoted
quoted
back to deadline or kyber and rmmod'ing the BFQ module reproducibly =
gives me:
quoted
quoted
quoted
=20 kernel: =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D
quoted
quoted
quoted
kernel: BUG bfq_queue (Tainted: G B ): Objects =
remaining in bfq_queue on __kmem_cache_shutdown()
quoted
quoted
quoted
kernel: =
--------------------------------------------------------------------------= ---
quoted
quoted
quoted
kernel: kernel: INFO: Slab 0xffffea001601fc00 objects=3D37 used=3D3 =
fp=3D0xffff8805807f0360 flags=3D0x8000000000008100
quoted
quoted
quoted
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G B =
4.14.5 #1
quoted
quoted
quoted
kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
quoted
quoted
quoted
kernel: Call Trace: kernel: dump_stack+0x46/0x5e kernel: slab_err+0x9e/0xb0 kernel: ? on_each_cpu_mask+0x35/0x40 kernel: ? ksm_migrate_page+0x60/0x60 kernel: ? __kmalloc+0x1c9/0x1d0 kernel: __kmem_cache_shutdown+0x177/0x350 kernel: shutdown_cache+0xf/0x130 kernel: kmem_cache_destroy+0x19e/0x1b0 kernel: SyS_delete_module+0x168/0x230 kernel: ? exit_to_usermode_loop+0x39/0x80 kernel: entry_SYSCALL_64_fastpath+0x13/0x94 kernel: RIP: 0033:0x7f53e4136b97 kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
quoted
quoted
quoted
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
quoted
quoted
quoted
kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
quoted
quoted
quoted
kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
quoted
quoted
quoted
kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
quoted
quoted
quoted
kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
quoted
quoted
quoted
kernel: INFO: Object 0xffff8805807f0000 @offset=3D0 kernel: INFO: Object 0xffff8805807f01b0 @offset=3D432 kernel: INFO: Object 0xffff8805807f3cc0 @offset=3D15552 kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G B =
4.14.5 #1
quoted
quoted
quoted
kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
quoted
quoted
quoted
kernel: Call Trace: kernel: dump_stack+0x46/0x5e kernel: kmem_cache_destroy+0x191/0x1b0 kernel: SyS_delete_module+0x168/0x230 kernel: ? exit_to_usermode_loop+0x39/0x80 kernel: entry_SYSCALL_64_fastpath+0x13/0x94 kernel: RIP: 0033:0x7f53e4136b97 kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
quoted
quoted
quoted
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
quoted
quoted
quoted
kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
quoted
quoted
quoted
kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
quoted
quoted
quoted
kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
quoted
quoted
quoted
kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
quoted
quoted
quoted
=20I also encountered the similar bug, seems there are some async =
objects which are not freed with
quoted
quoted
BFQ_GROUP_IOSCHED build as "Y". =20 If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c =
("block, bfq: add full hierarchicalquoted
quoted
scheduling and cgroups support")like below, then I can't see the bug =
again with simple test.
quoted
quoted
=20diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 889a8549d97f..c640e64e042f 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c@@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct =
elevator_queue *e)
quoted
quoted
spin_lock_irq(&bfqd->lock); list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, =
bfqq_list)
quoted
quoted
bfq_deactivate_bfqq(bfqd, bfqq, false, false); + bfq_put_async_queues(bfqd, bfqd->root_group); spin_unlock_irq(&bfqd->lock); =20 hrtimer_cancel(&bfqd->idle_slice_timer);@@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct =
elevator_queue *e)
quoted
quoted
blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq); #else spin_lock_irq(&bfqd->lock); - bfq_put_async_queues(bfqd, bfqd->root_group); kfree(bfqd->root_group); spin_unlock_irq(&bfqd->lock); =20 But perhaps we should do it inside blkcg_deactivate_policy, right? =20Thank you very much for investigating this. And yes, I removed that =
bfq_put_async_queues invocation a long ago, because it had to be done in = blkcg_deactivate_policy; and actually it was invoked as expected. We are = trying to understand what is going wrong now.
quoted hunk ↗ jump to hunk
=20 Thanks for the in formations. I find another way, does it make sense? =20diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index da1525ec4c87..0a070daf96c7 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c@@ -474,7 +474,10 @@ static void bfq_pd_init(struct blkg_policy_data =
*pd)
static void bfq_pd_free(struct blkg_policy_data *pd)
{
struct bfq_group *bfqg =3D pd_to_bfqg(pd);
+ struct bfq_data *bfqd =3D bfqg->bfqd;
=20
+ if (bfqd)
+ bfq_put_async_queues(bfqd, bfqg);
bfqg_stats_exit(&bfqg->stats);
bfqg_put(bfqg);
}
=20Yes, thanks. Actually, the mistake was a little bit more articulate. I = have just submitted a hopefully clean fix: https://www.spinics.net/lists/kernel/msg2692962.html Any feedback is more than welcome. Thanks again, Paolo
Regards, Guoqing