Re: [PATCH] block: Fix S_DAX inode flag locking
From: Dave Chinner <david@fromorbit.com>
Date: 2016-05-18 14:32:12
Also in:
linux-fsdevel
On Wed, May 18, 2016 at 10:20:39AM +0200, Jan Kara wrote:
On Wed 18-05-16 08:58:42, Dave Chinner wrote:quoted
On Tue, May 17, 2016 at 12:34:57PM -0700, Dan Williams wrote:quoted
On Tue, May 17, 2016 at 11:29 AM, Jon Derrick [off-list ref] wrote:quoted
This patch fixes S_DAX bd_inode i_flag locking to conform to suggestedA "fix" implies that its currently broken. I don't see how it is, not until we add an ioctl method or other path that also tries to update the flags outside of blkdev_get() context. So, I don't think this patch stands on its own if you were intending it to be merged separately.quoted
locking rules. It is presumed that S_DAX is the only valid inode flag for a block device which subscribes to direct-access, and will restore any previously set flags if direct-access initialization fails. This reverts to i_flags behavior prior to bbab37ddc20bae4709bca8745c128c4f46fe63c5 by allowing other bd_inode flags when DAX is disabled Signed-off-by: Jon Derrick <redacted> --- fs/block_dev.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)diff --git a/fs/block_dev.c b/fs/block_dev.c index 20a2c02..d41e37f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c@@ -1159,6 +1159,20 @@ void bd_set_size(struct block_device *bdev, loff_t size) } EXPORT_SYMBOL(bd_set_size); +static void bd_add_dax(struct inode *inode) +{ + inode_lock(inode); + inode->i_flags |= S_DAX; + inode_unlock(inode); +} + +static void bd_clear_dax(struct inode *inode) +{ + inode_lock(inode); + inode->i_flags &= ~S_DAX; + inode_unlock(inode); +}Since this is inode generic should these helpers be prefixed "i_" rather than "bd_"?Probably not, because in general filesystems are responsible for updating i_flags to reflect on-disk inode configuration and that's typically done under transaction contexts. e.g. through ioctl interfaces to set/clear flags that are stored on disk. As such, inode->i_flags is effectively protected by the filesystem specific locking heirarchy, not the generic inode_lock(). e.g. have a look at XFS storing a persistent "DAX-enabled" flag in the inode, which can be set/cleared on individual inodes dynamically by FS_IOC_FSSETXATTR. The XFS i_flags update function assumes exclusive access to the field as it is called under locked transaction context. Similar code exists in ext4, btrfs, gfs2, etc....So in case of ext4, we actually do use inode_lock() to protect against racing IOC_SETFLAGS calls. i_flags is a strange mix and there are a few (like S_DEAD or S_NOSEC) which are not persistent and those get set / cleared by VFS. Some of those places (e.g. the clearing in __vfs_setxattr_noperm()) are not really controlled by the filesystem AFAICT. So when XFS doesn't use inode_lock() to protect i_flags updates it can race with VFS on i_flags updates.
There are several other filesystems with the same problem. It's not actually clear what they rules for i_flags are. The comment above inode_set_flags() is ambiguous at best, and even points this out: * In the long run, i_mutex is overkill, and we should probably look * at using the i_lock spinlock to protect i_flags, and then make sure * it is so documented in include/linux/fs.h and that all code follows * the locking convention!! Perhaps that should be done before the situation gets worse... Cheers Dave. -- Dave Chinner david@fromorbit.com