Re: [PATCH FIXES/IMPROVEMENTS 0/7] block, bfq: preserve control, boost throughput, fix bugs
From: Piotr Górski <hidden>
Date: 2021-06-21 20:04:10
Also in:
lkml
I have tested this with myself and this error does not occur with me and I have not noticed any regressions. I have applied almost exactly the same patches as Oleksandr. pon., 21 cze 2021 o 21:55 Oleksandr Natalenko [off-list ref] napisał(a):
Hello. On sobota 19. června 2021 16:09:41 CEST Paolo Valente wrote:quoted
Hi Jens, this series contains an already proposed patch by Luca, plus six new patches. The goals of these patches are summarized in the subject of this cover letter. I'm including Luca's patch here, because it enabled the actual use of stable merge, and, as such, triggered an otherwise silent bug. This series contains also the fix for that bug ("block, bfq: avoid delayed merge of async queues"), tested by Holger [1]. Thanks, Paolo [1] https://lkml.org/lkml/2021/5/18/384 Luca Mariotti (1): block, bfq: fix delayed stable merge check Paolo Valente (5): block, bfq: let also stably merged queues enjoy weight raising block, bfq: consider also creation time in delayed stable merge block, bfq: avoid delayed merge of async queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: reset waker pointer with shared queues Pietro Pedroni (1): block, bfq: boost throughput by extending queue-merging times block/bfq-iosched.c | 68 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 15 deletions(-) -- 2.20.1Not sure everything goes fine here. After applying this series on top of the latest stable 5.12 kernel I got this:[16730.963248] kernel BUG at block/elevator.c:236! [16730.963254] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [16730.963257] CPU: 11 PID: 109170 Comm: kworker/u64:5 Tainted: G W 5.12.0-pf7 #1 [16730.963260] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 [16730.963263] Workqueue: dm-thin do_worker [dm_thin_pool] [16730.963270] RIP: 0010:elv_rqhash_find+0xcc/0xd0 [16730.963274] Code: 41 89 f0 81 e2 00 40 06 00 41 81 e0 1a 00 04 00 44 09 c2 75 a9 be 09 00 00 00 c4 e2 4b f7 50 28 48 03 50 30 48 39 fa 75 c6 c3 <0f> 0b 66 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 8b 47 68 48 [16730.963276] RSP: 0018:ffffa558d13b7af8 EFLAGS: 00010046 [16730.963279] RAX: ffff8a0007782d00 RBX: ffff8a0014b93000 RCX: ffffa558d13b7b78 [16730.963281] RDX: ffff8a0014b93000 RSI: 0000000000063082 RDI: 000000001e0fdc00 [16730.963283] RBP: ffff8a000731c770 R08: ffff8a000731c770 R09: fffffff0ffffddfb [16730.963284] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8a0330365c00 [16730.963286] R13: ffffa558d13b7b30 R14: 0000000000000000 R15: ffff8a0212fc4000 [16730.963288] FS: 0000000000000000(0000) GS:ffff8a070ecc0000(0000) knlGS: 0000000000000000 [16730.963290] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [16730.963292] CR2: 00007f1514d90f4c CR3: 0000000315952000 CR4: 0000000000350ee0 [16730.963294] Call Trace: [16730.963297] elv_merge+0x96/0x120 [16730.963300] blk_mq_sched_try_merge+0x3e/0x370 [16730.963303] bfq_bio_merge+0xd3/0x130 [16730.963306] blk_mq_submit_bio+0x11e/0x6c0 [16730.963309] submit_bio_noacct+0x457/0x530 [16730.963312] raid10_unplug+0x13f/0x1a0 [raid10] [16730.963316] blk_flush_plug_list+0xa9/0x110 [16730.963319] blk_finish_plug+0x21/0x30 [16730.963322] process_prepared_discard_passdown_pt1+0x204/0x2d0 [dm_thin_pool] [16730.963327] do_worker+0x18e/0xce0 [dm_thin_pool] [16730.963335] process_one_work+0x217/0x3e0 [16730.963338] worker_thread+0x4d/0x470 [16730.963343] kthread+0x182/0x1b0 [16730.963349] ret_from_fork+0x22/0x30 … [16730.963419] ---[ end trace dd7e037f2028257b ]--- [16730.963524] RIP: 0010:elv_rqhash_find+0xcc/0xd0 … [16730.963547] note: kworker/u64:5[109170] exited with preempt_count 1 [16747.948467] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!Which is:229 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset) 230 { … 235 hash_for_each_possible_safe(e->hash, rq, next, hash, offset) { 236 BUG_ON(!ELV_ON_HASH(rq)); …Yes, I carry some extra patches besides this series (the list is against v5.12 GA):block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising block: Do not pull requests from the scheduler when we cannot dispatch them blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block, bfq: avoid circular stable merges bfq: remove unnecessary BFQ_DEFAULT_GRP_IOPRIO bfq: reset entity->prio_changed in bfq_init_entity() bfq: optimize the calculation of bfq_weight_to_ioprio() bfq: remove unnecessary initialization logic bfq: keep the minimun bandwidth for CLASS_BE bfq: limit the IO depth of CLASS_IDLE to 1 bfq: convert the type of bfq_group.bfqd to bfq_data* bfq: introduce bfq_entity_to_bfqg helper method bfq/mq-deadline: remove redundant check for passthrough request blk-mq: bypass IO scheduler's limit_depth for passthrough request block,bfq: fix the timeout calculation in bfq_bfqq_charge_time block, bfq: merge bursts of newly-created queues block, bfq: keep shared queues out of the waker mechanism block, bfq: fix weight-raising resume with !low_latency block, bfq: make shared queues inherit wakers block, bfq: put reqs of waker and woken in dispatch list block, bfq: always inject I/O of queues blocked by wakersbut nothing from there triggered this for quite some time. Paolo, what do you think? -- Oleksandr Natalenko (post-factum)