Thread (14 messages) 14 messages, 4 authors, 2009-01-22

Re: [RFC] [PATCH] vfs: Call filesystem callback when backing device caches should be flushed

From: Jamie Lokier <hidden>
Date: 2009-01-21 23:55:31
Also in: linux-fsdevel

Dave Chinner wrote:
If the inode is dirty and fsync does nothing, then that filesystem
is *broken*. If writing to the inode doesn't dirty it, then the
filesystem is broken. Fix the broken filesystem.
*Wrong*  Very, very wrong.

You do not write totally unchanged inode bytes just for the sake of
causing a NOP transaction to make the disk write the fsync as a
side-effect of a broken paradigm.  That's _three_ pointless I/Os (one
redundant barrier and two writes), and probably 50x slowdown in write
performance due to seeking.  Now who's filesystem is broken?
quoted
For efficient fdatasync() you _never_ want a transaction if possible,
because it forces the disk head to seek between alternating regions of
the disk, two seeks per fsync().
If there is dirty metadata that is need to be logged or flushed,
then fdatasync() needs to do something. If it doesn't do it
correctly, then that *filesystem is broken*. Fix the broken
filesystem.
A series of a writes over existing data and fdatasync() should *never*
write to the transaction log, unless you mounted something like ext3
data=journal, which isn't usual.

There is no dirty metadata to write.  It is data only.  fdatasync()
*means* "do NOT write metadata that is not needed for data retrieval",
that's it's whole point.  A filesystem which keeps seeking to its
inode area _and_ its journal area _and_ the data area on every
fdatasync() is a poor design indeed.
quoted
So you can't rely on journalling transactions to flush.
The VFS doesn't even know about transactions....
Whoever brought them up said they can be relied on to flush writes
during fsync/fdatasync.  Just saying they can't, is all...
quoted
quoted
  Finally, I prefer maintainers of the filesystems themselves to
  decide whether their filesystem needs flushing and thus
  knowingly impose this performance penalty on them...
I say it should flush be default unless a filesystem hooks an
alternative strategy.  Certainly, it's silly to have the same code
duplicated in nearly every filesystem
So write a *generic helper* for those filesystems that do the same
thing and hook it to their ->fsync method. Don't hard code it in the
VFS so other filesystem dev's have to come along afterwards and turn
it off.
Are there any at the moment which would turn it off?
If so that's a fine idea.

-- Jamie
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help