Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master

From: Samuel Mendoza-Jonas <hidden>
Date: 2021-07-09 20:45:26
Also in: dm-devel, linux-block, linux-nvme, lkml

Possibly related (same subject, not in this thread)

2021-05-13 · Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master · Bart Van Assche <bvanassche@acm.org>
2021-05-13 · Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master · "Theodore Ts'o" <tytso@mit.edu>
2021-05-13 · Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master · Changheun Lee <hidden>
2021-05-09 · Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master · Alex Xu (Hello71) <hidden>
2021-05-09 · Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master · Jens Axboe <axboe@kernel.dk>

On Fri, May 14, 2021 at 07:26:14PM +0900, Changheun Lee wrote:

quoted

On 5/13/21 7:15 AM, Theodore Ts'o wrote:

quoted

On Thu, May 13, 2021 at 06:42:22PM +0900, Changheun Lee wrote:

quoted

Problem might be casued by exhausting of memory. And memory exhausting
would be caused by setting of small bio_max_size. Actually it was not
reproduced in my VM environment at first. But, I reproduced same problem
when bio_max_size is set with 8KB forced. Too many bio allocation would
be occurred by setting of 8KB bio_max_size.

Hmm... I'm not sure how to align your diagnosis with the symptoms in
the bug report.  If we were limited by memory, that should slow down
the I/O, but we should still be making forward progress, no?  And a
forced reboot should not result in data corruption, unless maybe there

If you use data=writeback, data writes and journal writes are not 
synchronized. So, it may be possible that a journal write made it through, 
a data write didn't - the end result would be a file containing random 
contents that was on the disk.

Changheun - do you use data=writeback? Did the corruption happen only in 
newly created files? Or did it corrupt existing files?

Actually I didn't reproduced data corruption. I only reproduced hang during
making ext4 filesystem. Alex, could you check it?

quoted

was a missing check for a failed memory allocation, causing data to be
written to the wrong location, a missing error check leading to the
block or file system layer not noticing that a write had failed
(although again, memory exhaustion should not lead to failed writes;
it might slow us down, sure, but if writes are being failed, something
is Badly Going Wrong --- things like writes to the swap device or
writes by the page cleaner must succeed, or else Things Would Go Bad
In A Hurry).

Mikulas

I've recently been debugging an issue that isn't this exact issue
(it occurs in 5.10), but looks somewhat similar.
On a host that
- Is running a kernel 5.4 >= x >= 5.10.47 at least
- Using an EXT4 + LUKS partition
- Running Elasticsearch stress tests

We see that the index files used by the Elasticsearch process become
corrupt after some time, and in each case I've seen so far the content
of the file looks like the EXT4 extent header. 
	#define EXT4_EXT_MAGIC          cpu_to_le16(0xf30a)

For example:
$ hexdump -C /hdd1/nodes/0/indices/c6eSGDlCRjaWeIBwdeo9DQ/0/index/_23c.si
00000000  0a f3 04 00 54 01 00 00  00 00 00 00 00 00 00 00  |....T...........|
00000010  00 38 00 00 00 60 46 05  00 38 00 00 00 88 00 00  |.8...`F..8......|
00000020  00 98 46 05 00 40 00 00  00 88 00 00 00 a0 46 05  |..F..@........F.|
00000030  00 48 00 00 00 88 00 00  00 a8 46 05 00 48 00 00  |.H........F..H..|
00000040  00 88 00 00 00 a8 46 05  00 48 00 00 00 88 00 00  |......F..H......|
00000050  00 a8 46 05 00 48 00 00  00 88 00 00 00 a8 46 05  |..F..H........F.|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001a0  00 00                                             |..|
000001a2


I'm working on tracing exactly when this happens, but I'd be interested
to hear if that sounds familar or might have a similar underlying cause
beyond the commit that was reverted above.

Cheers,
Sam Mendoza-Jonas

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help