Re: [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting
From: Jeff Layton <hidden>
Date: 2017-05-10 12:19:50
Also in:
linux-btrfs, linux-cifs, linux-ext4, linux-f2fs-devel, linux-fsdevel, linux-mm, linux-nfs, linux-xfs, lkml
On Wed, 2017-05-10 at 13:48 +0200, Jan Kara wrote:
On Tue 09-05-17 11:49:17, Jeff Layton wrote:quoted
Most filesystems currently use mapping_set_error and filemap_check_errors for setting and reporting/clearing writeback errors at the mapping level. filemap_check_errors is indirectly called from most of the filemap_fdatawait_* functions and from filemap_write_and_wait*. These functions are called from all sorts of contexts to wait on writeback to finish -- e.g. mostly in fsync, but also in truncate calls, getattr, etc. The non-fsync callers are problematic. We should be reporting writeback errors during fsync, but many places spread over the tree clear out errors before they can be properly reported, or report errors at nonsensical times. If I get -EIO on a stat() call, there is no reason for me to assume that it is because some previous writeback failed. The fact that it also clears out the error such that a subsequent fsync returns 0 is a bug, and a nasty one since that's potentially silent data corruption. This patch adds a small bit of new infrastructure for setting and reporting errors during address_space writeback. While the above was my original impetus for adding this, I think it's also the case that current fsync semantics are just problematic for userland. Most applications that call fsync do so to ensure that the data they wrote has hit the backing store. In the case where there are multiple writers to the file at the same time, this is really hard to determine. The first one to call fsync will see any stored error, and the rest get back 0. The processes with open fds may not be associated with one another in any way. They could even be in different containers, so ensuring coordination between all fsync callers is not really an option. One way to remedy this would be to track what file descriptor was used to dirty the file, but that's rather cumbersome and would likely be slow. However, there is a simpler way to improve the semantics here without incurring too much overhead. This set adds an errseq_t to struct address_space, and a corresponding one is added to struct file. Writeback errors are recorded in the mapping's errseq_t, and the one in struct file is used as the "since" value. This changes the semantics of the Linux fsync implementation such that applications can now use it to determine whether there were any writeback errors since fsync(fd) was last called (or since the file was opened in the case of fsync having never been called). Note that those writeback errors may have occurred when writing data that was dirtied via an entirely different fd, but that's the case now with the current mapping_set_error/filemap_check_error infrastructure. This will at least prevent you from getting a false report of success. The new behavior is still consistent with the POSIX spec, and is more reliable for application developers. This patch just adds some basic infrastructure for doing this. Later patches will change the existing code to use this new infrastructure. Signed-off-by: Jeff Layton <redacted>Just one nit below. Otherwise the patch looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz>quoted
diff --git a/fs/file_table.c b/fs/file_table.c index 954d510b765a..d6138b6411ff 100644 --- a/fs/file_table.c +++ b/fs/file_table.c@@ -168,6 +168,7 @@ struct file *alloc_file(const struct path *path, fmode_t mode, file->f_path = *path; file->f_inode = path->dentry->d_inode; file->f_mapping = path->dentry->d_inode->i_mapping; + file->f_wb_err = filemap_sample_wb_error(file->f_mapping);Why do you sample here when you also sample in do_dentry_open()? I didn't find any alloc_file() callers that would possibly care about writeback errors... Honza
I basically used the setting of f_mapping as a guideline as to where to sample it for initialization. My thinking was that if f_mapping ever ended up different then you'd probably also want f_wb_err to be resampled anyway. I can drop this hunk if you think we don't need it. -- Jeff Layton [off-list ref]