Thread (10 messages) 10 messages, 4 authors, 2022-09-12

Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

From: Jeff Layton <jlayton@kernel.org>
Date: 2022-09-12 15:49:46
Also in: ceph-devel, linux-api, linux-btrfs, linux-ext4, linux-fsdevel, linux-nfs, linux-xfs, lkml

Possibly related (same subject, not in this thread)

On Mon, 2022-09-12 at 15:32 +0000, Trond Myklebust wrote:
On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote:
quoted
On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote:
quoted
On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote:
quoted
On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote:
quoted
On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote:
quoted
Because of the "seen" flag, we have a 63 bit counter to play
with.
Could
we use a similar scheme to the one we use to handle when
"jiffies"
wraps? Assume that we'd never compare two values that were
more
than
2^62 apart? We could add i_version_before/i_version_after
macros to
make
it simple to handle this.
As far as I recall the protocol just assumes it can never
wrap. 
I
guess
you could add a new change_attr_type that works the way you
describe.
But without some new protocol clients aren't going to know what
to do
with a change attribute that wraps.

I think this just needs to be designed so that wrapping is
impossible
in
any realistic scenario.  I feel like that's doable?

If we feel we have to catch that case, the only 100% correct
behavior
would probably be to make the filesystem readonly.
Which protocol? If you're talking about basic NFSv4, it doesn't
assume
anything about the change attribute and wrapping.

The NFSv4.2 protocol did introduce the optional attribute
'change_attr_type' that tries to describe the change attribute
behaviour to the client. It tells you if the behaviour is
monotonically
increasing, but doesn't say anything about the behaviour when the
attribute value overflows.

That said, the Linux NFSv4.2 client, which uses that
change_attr_type
attribute does deal with overflow by assuming standard uint64_t
wrap
around rules. i.e. it assumes bit values > 63 are truncated,
meaning
that the value obtained by incrementing (2^64-1) is 0.
Yeah, it was the MONOTONIC_INCRE case I was thinking of.  That's
interesting, I didn't know the client did that.
If you look at where we compare version numbers, it is always some
variant of the following:

static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr
*fattr,
                                         const struct inode *inode)
{
        s64 diff = fattr->change_attr -
inode_peek_iversion_raw(inode);
        if (diff > 0)
                return 1;
        return diff == 0 ? 0 : -1;
}

i.e. we do an unsigned 64-bit subtraction, and then cast it to the
signed 64-bit equivalent in order to figure out which is the more
recent value.
Good! This seems like the reasonable thing to do, given that the spec
doesn't really say that the change attribute has to start at low values.
...and by the way, yes this does mean that if you suddenly add a value
of 2^63 to the change attribute, then you are likely to cause the
client to think that you just handed it an old value.

i.e. you're better off having the crash counter increment the change
attribute by a relatively small value. One that is guaranteed to be
larger than the values that may have been lost, but that is not
excessively large.
Yeah.

Like with jiffies, you need to make sure the samples you're comparing
aren't _too_ far off. That should be doable here -- 62 bits is plenty of
room to store a lot of change values.

My benchmark (maybe wrong, but maybe good enough) is to figure on an
increment per nanosecond for a worst-case scenario. With that, 2^40
nanoseconds is >12 days. Maybe that's overkill.

2^32 ns is about an hour and 20 mins. That's probably a reasonable value
to use. If we can't get a a new value onto disk in that time then
something is probably very wrong.
-- 
Jeff Layton [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help