Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

[RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 11/30] fs: new API for handling i_version · Jeff Layton <hidden> · 2016-12-21
Re: [RFC PATCH v1 11/30] fs: new API for handling i_version · J. Bruce Fields <hidden> · 2017-03-03
Re: [RFC PATCH v1 11/30] fs: new API for handling i_version · Jeff Layton <hidden> · 2017-03-04
Re: [RFC PATCH v1 11/30] fs: new API for handling i_version · NeilBrown <neil@brown.name> · 2017-03-03
Re: [RFC PATCH v1 11/30] fs: new API for handling i_version · Jeff Layton <hidden> · 2017-03-04
[RFC PATCH v1 18/30] ext4: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t · Jeff Layton <hidden> · 2016-12-21
Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t · Amir Goldstein <amir73il@gmail.com> · 2016-12-22
Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t · Jeff Layton <hidden> · 2016-12-22
Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t · NeilBrown <neil@brown.name> · 2017-03-04
[RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 03/30] ceph: remove the bump of i_version · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 04/30] f2fs: don't bother setting i_version · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 06/30] jfs: remove initialization of i_version counter · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 08/30] orangefs: remove initialization of i_version · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 10/30] ntfs: remove i_version handling · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 12/30] fat: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 13/30] affs: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 14/30] afs: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 15/30] btrfs: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 16/30] exofs: switch to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 17/30] ext2: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 19/30] nfs: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 20/30] nfsd: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 21/30] ocfs2: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 22/30] ufs: use new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 23/30] xfs: convert to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 24/30] IMA: switch IMA over to new i_version API · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed · Jeff Layton <hidden> · 2016-12-21
[RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag · Jeff Layton <hidden> · 2016-12-21
Re: [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag · NeilBrown <neil@brown.name> · 2017-03-04
Re: [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag · Jeff Layton <hidden> · 2017-03-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Christoph Hellwig <hch@infradead.org> · 2016-12-22
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2016-12-22
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-20
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Christoph Hellwig <hch@infradead.org> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-03-29
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-29
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Dave Chinner <david@fromorbit.com> · 2017-03-29
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-04-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Boaz Harrosh <hidden> · 2017-03-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-04-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · NeilBrown <neil@brown.name> · 2017-04-05
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-04-05
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-04-05
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-05-11
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · NeilBrown <neil@brown.name> · 2017-05-11
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-05-12
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-10-30
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-05-12
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-05-12
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-05-12
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-05-12
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · NeilBrown <neil@brown.name> · 2017-04-06
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-04-06
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-04-05
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Dave Chinner <david@fromorbit.com> · 2017-04-01
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jan Kara <jack@suse.cz> · 2017-04-03
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Dave Chinner <david@fromorbit.com> · 2017-04-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-04-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · NeilBrown <neil@brown.name> · 2017-04-05
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Dave Chinner <david@fromorbit.com> · 2017-03-21
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-22
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-03
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · Jeff Layton <hidden> · 2017-03-04
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization · J. Bruce Fields <hidden> · 2017-03-08

From: Jeff Layton <hidden>
Date: 2017-03-30 11:24:00
Also in: linux-btrfs, linux-fsdevel, linux-nfs, linux-xfs, lkml

On Thu, 2017-03-30 at 10:41 +1100, Dave Chinner wrote:

On Wed, Mar 29, 2017 at 01:54:31PM -0400, Jeff Layton wrote:

quoted

On Wed, 2017-03-29 at 13:15 +0200, Jan Kara wrote:

quoted

On Tue 21-03-17 14:46:53, Jeff Layton wrote:

quoted

On Tue, 2017-03-21 at 14:30 -0400, J. Bruce Fields wrote:

quoted

On Tue, Mar 21, 2017 at 01:23:24PM -0400, Jeff Layton wrote:

quoted

On Tue, 2017-03-21 at 12:30 -0400, J. Bruce Fields wrote:

quoted

- It's durable; the above comparison still works if there were reboots
  between the two i_version checks.
	- I don't know how realistic this is--we may need to figure out
	  if there's a weaker guarantee that's still useful.  Do
	  filesystems actually make ctime/mtime/i_version changes
	  atomically with the changes that caused them?  What if a
	  change attribute is exposed to an NFS client but doesn't make
	  it to disk, and then that value is reused after reboot?

Yeah, there could be atomicity there. If we bump i_version, we'll mark
the inode dirty and I think that will end up with the new i_version at
least being journalled before __mark_inode_dirty returns.

So you think the filesystem can provide the atomicity?  In more detail:

Sorry, I hit send too quickly. That should have read:

"Yeah, there could be atomicity issues there."

I think providing that level of atomicity may be difficult, though
maybe there's some way to make the querying of i_version block until
the inode update has been journalled?

Just to complement what Dave said from ext4 side - similarly as with XFS
ext4 doesn't guarantee atomicity unless fsync() has completed on the file.
Until that you can see arbitrary combination of data & i_version after the
crash. We do take care to keep data and metadata in sync only when there
are security implications to that (like exposing uninitialized disk blocks)
and if not, we are as lazy as we can to improve performance...

Yeah, I think what we'll have to do here is ensure that those
filesystems do an fsync prior to reporting the i_version getattr
codepath. It's not pretty, but I don't see a real alternative.

I think that's even more problematic. ->getattr currently runs
completely unlocked for performance reasons - it's racy w.r.t. to
ongoing modifications to begin with, so /nothing/ that is returned
to userspace via stat/statx can be guaranteed to be "coherent".
Linus will be very unhappy if you make his git workload (which is
/very/ stat heavy) run slower by adding any sort of locking in this
hot path.

Even if we did put an fsync() into ->getattr() (and dealt with all
the locking issues that entails), by the time the statx syscall
returns to userspace the i_version value may not match the
data/metadata in the inode(*).  IOWs, by the time i_version gets
to userspace, it is out of date and any use of it for data
versioning from userspace is going to be prone to race conditions.

Cheers,

Dave.

(*) fiemap has exactly the same "stale the moment internal fs
locks are released" race conditions, which is why it cannot safely
be used for mapping holes when copying file data....

FWIW, I'm not terribly worried about atomicity for all of the reasons
you describe. My main concern is reusing an i_version value that has
already been handed out when the inode is now in an entirely different
state. To that end, I was only considering doing an fsync iff
STATX_VERSION was requested. If it wasn't we wouldn't need to do one.

A lot of ->getattr implementations already call filemap_write_and_wait,
but you're correct that flushing out the metadata is different matter. 
It'd be nice to avoid that if we can.

-- 
Jeff Layton [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help