Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
From: Dan Williams <hidden>
Date: 2021-11-02 16:44:16
Also in:
dm-devel, linux-fsdevel, lkml, nvdimm
On Tue, Oct 26, 2021 at 11:50 PM Christoph Hellwig [off-list ref] wrote:
On Fri, Oct 22, 2021 at 08:52:55PM +0000, Jane Chu wrote:quoted
Thanks - I try to be honest. As far as I can tell, the argument about the flag is a philosophical argument between two views. One view assumes design based on perfect hardware, and media error belongs to the category of brokenness. Another view sees media error as a build-in hardware component and make design to include dealing with such errors.No, I don't think so. Bit errors do happen in all media, which is why devices are built to handle them. It is just the Intel-style pmem interface to handle them which is completely broken.
No, any media can report checksum / parity errors. NVME also seems to do a poor job with multi-bit ECC errors consumed from DRAM. There is nothing "pmem" or "Intel" specific here.
quoted
errors in mind from start. I guess I'm trying to articulate why it is acceptable to include the RWF_DATA_RECOVERY flag to the existing RWF_ flags. - this way, pwritev2 remain fast on fast path, and its slow path (w/ error clearing) is faster than other alternative. Other alternative being 1 system call to clear the poison, and another system call to run the fast pwrite for recovery, what happens if something happened in between?Well, my point is doing recovery from bit errors is by definition not the fast path. Which is why I'd rather keep it away from the pmem read/write fast path, which also happens to be the (much more important) non-pmem read/write path.
I would expect this interface to be useful outside of pmem as a "failfast" or "try harder to recover" flag for reading over media errors.