Re: [PATCH -v3] ext4: fix unjournaled inode bitmap modification

From: Andreas Dilger <hidden>
Date: 2012-10-29 05:07:17

On 2012-10-28, at 8:30 PM, Theodore Ts'o wrote:

Hence in my version of the patch, I've waited until right before the
call to ext4_lock_group() before calling get_write_access().  Note
that it's safe to call get_write_access() on a bh twice; the second
time the jbd2 layer will notice that the bh is already a part of the
transaction.

Also, leaving out the calls to ext4_handle_release_buffer() makes the
patch easier to understand and reason about.

What do you think of this version?

Looks good.

quoted hunk ↗ jump to hunk

commit 67d725143e9e7ea458a0c1c4a6625657c3dc7ba2
Author: Eric Sandeen [off-list ref]
Date:   Sun Oct 28 22:24:57 2012 -0400

   ext4: fix unjournaled inode bitmap modification

   commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed
   ext4_new_inode() such that the inode bitmap was being modified
   outside a transaction, which could lead to corruption, and was
   discovered when journal_checksum found a bad checksum in the
   journal during log replay.

   Nix ran into this when using the journal_async_commit mount
   option, which enables journal checksumming.  The ensuing
   journal replay failures due to the bad checksums led to
   filesystem corruption reported as the now infamous
   "Apparent serious progressive ext4 data corruption bug"

   [ Changed by tytso to only call ext4_journal_get_write_access() only
     when we're fairly certain that we're going to allocate the inode. ]

   I've tested this by mounting with journal_checksum and
   running fsstress then dropping power; I've also tested by
   hacking DM to create snapshots w/o first quiescing, which
   allows me to test journal replay repeatedly w/o actually
   power-cycling the box.  Without the patch I hit a journal
   checksum error every time.  With this fix it survives
   many iterations.

   Reported-by: Nix [off-list ref]
   Signed-off-by: Eric Sandeen [off-list ref]
   Signed-off-by: "Theodore Ts'o" [off-list ref]
   Cc: stable@vger.kernel.org

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4facdd2..3a100e7 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c

@@ -725,6 +725,10 @@ repeat_in_this_group:

				   "inode=%lu", ino + 1);
			continue;
		}
+		BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
+		err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
+		if (err)
+			goto fail;
		ext4_lock_group(sb, group);
		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
		ext4_unlock_group(sb, group);

@@ -738,6 +742,11 @@ repeat_in_this_group:

	goto out;

got:
+	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
+	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
+	if (err)
+		goto fail;
+
	/* We may have to initialize the block bitmap if it isn't already */
	if (ext4_has_group_desc_csum(sb) &&
	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {

@@ -771,11 +780,6 @@ got:

			goto fail;
	}

-	BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
-	if (err)
-		goto fail;
-
	BUFFER_TRACE(group_desc_bh, "get_write_access");
	err = ext4_journal_get_write_access(handle, group_desc_bh);
	if (err)

@@ -823,11 +827,6 @@ got:

	}
	ext4_unlock_group(sb, group);

-	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
-	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
-	if (err)
-		goto fail;
-
	BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
	err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
	if (err)


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help