Re: 6.5.0rc5 fs hang - ext4? raid? | linux-raid

Re: 6.5.0rc5 fs hang - ext4? raid?

From: Dr. David Alan Gilbert <hidden>
Date: 2023-08-16 14:45:41
Also in: linux-ext4, lkml

* Jens Axboe (axboe@kernel.dk) wrote:

On 8/15/23 7:31 PM, Dr. David Alan Gilbert wrote:

quoted

(Copying in Christoph and Jens)

* Dr. David Alan Gilbert (dave@treblig.org) wrote:

quoted

* Dr. David Alan Gilbert (dave@treblig.org) wrote:

quoted

* Theodore Ts'o (tytso@mit.edu) wrote:

quoted

On Mon, Aug 14, 2023 at 09:02:53PM +0000, Dr. David Alan Gilbert wrote:

quoted

dg         29594   29592  0 18:40 pts/0    00:00:00 /usr/bin/ar --plugin /usr/libexec/gcc/x86_64-redhat-linux/13/liblto_plugin.so -csrDT src/intel/perf/libintel_perf.a src/intel/perf/libintel_perf.a.p/meson-generated_.._intel_perf_metrics.c.o src/intel/perf/libintel_perf.a.p/intel_perf.c.o src/intel/perf/libintel_perf.a.p/intel_perf_query.c.o src/intel/perf/libintel_perf.a.p/intel_perf_mdapi.c.o

[root@dalek dg]# cat /proc/29594/stack 
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xd2/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd7/0x150
[<0>] blk_finish_plug+0x29/0x40
[<0>] ext4_do_writepages+0x401/0xc90
[<0>] ext4_writepages+0xad/0x180

If you want a few seconds and try grabbing cat /proc/29594/stack
again, what does the stack trace stay consistent as above?

I'll get back to that and retry it.

Yeh, the stack is consistent; this time around it's an 'ar' in a kernel
build:

[root@dalek dg]# cat /proc/17970/stack
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd7/0x150
[<0>] blk_finish_plug+0x29/0x40
[<0>] ext4_do_writepages+0x401/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xd2/0x1e0
[<0>] filemap_fdatawrite_wbc+0x63/0x90
[<0>] __filemap_fdatawrite_range+0x5c/0x80
[<0>] ext4_release_file+0x74/0xb0
[<0>] __fput+0xf5/0x2a0
[<0>] task_work_run+0x5d/0x90
[<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
[<0>] syscall_exit_to_user_mode+0x1b/0x40
[<0>] do_syscall_64+0x6c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[root@dalek dg]# cat /proc/17970/stack
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd7/0x150
[<0>] blk_finish_plug+0x29/0x40
[<0>] ext4_do_writepages+0x401/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xd2/0x1e0
[<0>] filemap_fdatawrite_wbc+0x63/0x90
[<0>] __filemap_fdatawrite_range+0x5c/0x80
[<0>] ext4_release_file+0x74/0xb0
[<0>] __fput+0xf5/0x2a0
[<0>] task_work_run+0x5d/0x90
[<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
[<0>] syscall_exit_to_user_mode+0x1b/0x40
[<0>] do_syscall_64+0x6c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

quoted

Also, if you have iostat installed (usually part of the sysstat
package), does "iostat 1" show any I/O activity on the md device?

iostat is showing something odd, most devices are at 0,
except for 3 of the dm's that are stuck at 100% utilisation with
apparently nothing going on:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.03   53.06    0.00   46.84

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
...
dm-16            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
dm-17            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
dm-18            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-19            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-2             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-20            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
....

dm-20 is the /dev/mapper/main-more which is the RAID on which the
fs runs, 16 and 17 are main-more_rmeta_0 and main-more_rimage_0
so something screwy is going on there.

I've just finished a bisect of this hang, and got to:

615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
Author: Christoph Hellwig [off-list ref]
Date:   Fri May 19 06:40:48 2023 +0200

    blk-mq: defer to the normal submission path for post-flush requests

    Requests with the FUA bit on hardware without FUA support need a post
    flush before returning to the caller, but they can still be sent using
    the normal I/O path after initializing the flush-related fields and
    end I/O handler.

    Signed-off-by: Christoph Hellwig [off-list ref]
    Reviewed-by: Bart Van Assche [off-list ref]
    Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@lst.de (local)
    Signed-off-by: Jens Axboe [off-list ref]

Can you try and pull in:

https://git.kernel.dk/cgit/linux/commit/?h=block-6.5&id=5ff3213a5387e076af2b87f796f94b36965e8c3a

and see if that helps?

<testing....>

Dave

-- 
Jens Axboe

-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help