Thread (5 messages) 5 messages, 4 authors, 2018-09-16

Re: metadata operation reordering regards to crash

From: Qu Wenruo <hidden>
Date: 2018-09-16 06:39:33
Also in: linux-btrfs, linux-fsdevel, lkml


On 2018/9/15 下午2:58, 焦晓冬 wrote:
On Sat, Sep 15, 2018 at 6:23 AM Dave Chinner [off-list ref] wrote:
quoted
On Fri, Sep 14, 2018 at 05:06:44PM +0800, 焦晓冬 wrote:
quoted
Hi, all,

A probably bit of complex question:
Does nowadays practical filesystems, eg., extX, btfs, preserve metadata
operation order through a crash/power failure?
Yes.

Behaviour is filesystem dependent, but we have tests in fstests that
specifically exercise order preservation across filesystem failures.
quoted
What I know is modern filesystems ensure metadata consistency
after crash/power failure. Journal filesystems like extX do that by
write-ahead logging of metadata operations into transactions. Other
filesystems do that in various ways as btfs do that by COW.

What I'm not so far clear is whether these filesystems preserve
metadata operation order after a crash.

For example,
op 1.  rename(A, B)
op 2.  rename(C, D)

As mentioned above,  metadata consistency is ensured after a crash.
Thus, B is either the original B(or not exists) or has been replaced by A.
The same to D.

Is it possible that, after a crash, D has been replaced by C but B is still
the original file(or not exists)?
Not for XFS, ext4, btrfs or f2fs. Other filesystems might be
different.
Thanks, Dave,

I found this archive:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31937.html

It seems btrfs people thinks reordering could happen.
It depends.

For default btrfs (using log tree), it depends on the log replay code
(which is somewhat like journal, but not completely the same).

Unfortunately I'm not a expert on that part, but tree log is more a
performance optimization other than a vital part to keep fs consistent.

But if using notreelog mount option, btrfs won't use log tree and falls
back to sync() for all fsync() due to its metadata organization.

And in that case, there is no reordering at all. It uses metadata CoW to
ensure everything is consistent.
In that case, power loss happens either before or after super block
write back.
For old superblock it always points to old trees, and vice verse for new
superblock.
So one will only see either the new fs or the old fs, thus making btrfs
atomic for its metadata update.

Thanks,
Qu
It is a relatively old reply. Has the implement changed? Or is there
some new standard that requires reordering not happen?
quoted
Cheers,

Dave,
--
Dave Chinner
david@fromorbit.com
  

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help