Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
From: Jeff Layton <hidden>
Date: 2017-05-31 22:01:10
Also in:
linux-block, linux-fsdevel, lkml
On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote:
On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton [off-list ref] wrote:quoted
On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:quoted
On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton [off-list ref] wrote:quoted
This is v5 of the patchset to improve how we're tracking and reporting errors that occur during pagecache writeback.I'm curious to know how you've been testing this? Is that testing strong enough for us to be confident that all nature of I/O errors will be reported to userspace?That's a tall order. This is a difficult thing to test as these sorts of errors are pretty rare by nature. I have an xfstest that I posted just after this set that demonstrates that it works correctly, at least on ext2/3/4 when run by the ext4 driver (ext2 legacy driver reports too many errors currently). I had btrfs and xfs working on that test too in an earlier incarnation of this set, so I think we can fix this in them as well without too much difficulty. I'm happy to run other tests if someone wants to suggest them. Now, all that said, I don't think this will make things any worse than they are today as far as reporting errors properly to userland goes. It's rather easy for an incidental synchronous writeback request from an internal caller to clear the AS_* flags today. This will at least ensure that we're reporting errors since a well-defined point in time when you call fsync.Were you using error injection of some form? If so, how was that all set up?
Yes, it uses dm-error for fault injection. The test basically does: 1) set up a dm-error device in a working configuration 2) build a scratch filesystem on it, with the log on a different device in some fashion so metadata writeback will still succeed. 3) open the same file several times 4) flip dm-error device to non-working mode 5) write to each fd 6) fsync each fd ...do you get back an error on each fsync? It then does a bit more to make sure they're cleared afterward as you'd expect. That works for most block device based filesystems. I also have a second xfstest that opens a block device and does the same basic thing. That also works correctly with this patch series. I still need to come up with a way to simulate errors on other fs' though. We may need to plumb in some kernel-level fault injection on some fs' to do that correctly. Suggestions welcome there. With this series though, the idea is to convert one filesystem at a time, so I think that should help mitigate some of the risk. -- Jeff Layton [off-list ref]