Thread (8 messages) 8 messages, 6 authors, 2007-07-11

Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

From: Mingming Cao <hidden>
Date: 2007-07-11 02:09:08
Also in: linux-fsdevel, lkml

On Wed, 2007-07-11 at 13:21 +1000, Neil Brown wrote:
On Tuesday July 10, akpm@linux-foundation.org wrote:
quoted
Yes, thanks.  It doesn't actually tell us why we want to implement
this attribute and it doesn't tell us what the implications of failing
to do so are, but I guess we can take that on trust from the NFS guys.
You would like to think so, but remember NFSv4 was designed by a
committee :-)

The 'change' number is used for cache consistency, and as the spec
makes very strong statements about the 'change' number, it is very
hard (or impossible) to implement a server correctly without storing a
change number in stable storage (just one of my grips about V4).
quoted
But I suspect the ext4 implementation doesn't actually do this.  afaict we
won't update i_version for file overwrites (especially if s_time_gran can
indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
would be the implications of this?
The first part sounds like a bug - i_version should really be updated
by every call to ->commit_write (if that is still what it is called).

The MAP_SHARED thing is less obvious.  I guess every time we notice
that the page might have been changed, we need to increment i_version.
quoted
And how does the NFS server know that the filesystem implements i_version? 
Will a zero-value of i_version have special significance, telling the
server to not send this attribute, perhaps?
That is a very important question.  Zero probably makes sense, but
what ever it is needs to be agreed and documented.
And just by-the-way, the server doesn't really have the option of not
sending the attribute.  If i_version isn't defined, it has to fake
something using mtime, and hope that is good enough.

Alternately we could mandate that i_version is always kept up-to-date
and if a filesystem doesn't have anything to load from storage, it
just sets it to the current time in nanoseconds.

That would mean that a client would need to flush it's cache whenever
the inode fell out of cache on the server, but I don't think we can
reliably do better than that.

I think I like that approach.

So my vote is to increment i_version in common code every time any
change is made to the file, 
David Chinneer pointed that we need to journal the version number
updates together with the operations that causes the change of the inode
version number, in order to survive server crashes so clients won't see
the counter go backwards.

So increment i_version in fs code is probably the place to ensure the
inode version changes are stored to disk. It's seems update the ext4
inode version in every ext4_mark_inode_dirty() is the easiest way.
and alloc_inode should initialise it to
current time, which might be changed by the filesystem before it calls
unlock_new_inode. 
... but doesn't lustre want to control its i_version... so maybe not :-(

NeilBrown
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help