Thread (21 messages) 21 messages, 6 authors, 2012-01-13

Re: [PATCH 0/4] Fix filesystem freezing

From: Jan Kara <jack@suse.cz>
Date: 2012-01-13 11:07:59
Also in: linux-fsdevel, linux-xfs, lkml

On Fri 13-01-12 11:09:32, Dave Chinner wrote:
On Thu, Jan 12, 2012 at 12:30:31PM +0100, Jan Kara wrote:
quoted
On Thu 12-01-12 13:48:41, Dave Chinner wrote:
quoted
On Thu, Jan 12, 2012 at 02:20:49AM +0100, Jan Kara wrote:
quoted
  Hello,

  filesystem freezing is currently racy and thus we can end up with dirty data
on frozen filesystem (see changelog of the first patch for detailed race
description and proposed fix). This patch series aims at fixing this.
It only fixes the dirty data race (i.e. SB_FREEZE_WRITE). The same
race conditions exist for SB_FREEZE_TRANS on XFS, and so need the
same fix. That race has had one previous attempt at fixing it in
XFS but that's not possible:

b2ce397 Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"
7a249cf xfs: fix filesystsem freeze race in xfs_trans_alloc

It was looking at that problem earlier today that lead to the
solution Eric proposed. Essentially the method in these patches
needs to replace the xfs specifc m_active_trans counter and delay
during ->fs_freeze to prevent that race condition....
  OK, I see. I just checked ext4 to make sure and ext4 seems to get this
right. Looking into Christoph's original patch it shouldn't be hard to fix
it. Instead of:
        atomic_inc(&mp->m_active_trans);
 
        if (wait_for_freeze)
              xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);

we just need to do a bit more elaborate

retry:
        if (wait_for_freeze)
              xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
        atomic_inc(&mp->m_active_trans);
	if (wait_for_freeze && mp->m_super->s_frozen >= SB_FREEZE_TRANS) {
        	atomic_dec(&mp->m_active_trans);
		goto retry;
	}

Or does XFS support nested transactions (i.e. a thread already holding a
running transaction can call into xfs_trans_alloc() again)?
That would make things more complicated...
You're still missing the point - that this isn't an XFS specific
problem or that the write problem is a ext4 specific problem. The
problem is that these are freeze state transition problems -
something that can affect every filesystem because the freeze code
is generic.  Quite frankly, I'm not interested in having a generic
solution for SB_FREEZE_WRITE and a custom, per filesystem solution
for SB_FREEZE_TRANS when the solution is exactly the same.
  I understand that both state transitions are currently racy. Just ext3,
ext4, reiserfs, gfs2, or btrfs do not really care about SB_FREEZE_TRANS
transition because they all grew their own synchronization mechanisms for
that. XFS is the only filesystem I know of which really relies on this
transition. That's why I originally decided to fixup SB_FREEZE_TRANS
transition only in XFS and not in VFS. But on a second thought I tend to
agree with you that VFS should provide a way to do race-free transition to
both states so that filesystems that want to use it can use it. So I'll add
a second counter for that.
 
quoted
Using sb_start_write() instead of m_active_trans won't be that easy because
it can create A-A deadlocks (e.g. we do sb_start_write in
block_page_mkwrite() and then xfs_get_blocks() decides to start a
transaction and calls sb_start_write() again which might block if
filesystem freezing started in the mean time).
So, like Eric said in his first email, it's not a "write start/end"
interface that is needed, the interface has to work with different
freeze levels (e.g "sb_freeze_ref(sb, level)/sb_freeze_drop(sb,
level)").  Sure, internally it would have to map to two counters and
different level checks, but it solves the same problem for all
levels of freeze for all filesystems.

Let's fix this freeze problem once and for all in the generic code,
and not have to keep coming back to it to add more functioanlity for
different situations the most recent fix didn't handle for random
filesystem X....
  Yeah. I think ext3/4 could be converted to the generic mechanism
(although it won't be completely trivial since it uses the internal
mechanism also for other things than filesystem freezing).
								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help