Thread (19 messages) 19 messages, 5 authors, 2021-06-10

Re: [PATCH v2 2/3] btrfs: zoned: fix compressed writes

From: Qu Wenruo <hidden>
Date: 2021-06-10 07:28:07


On 2021/5/18 下午11:40, Johannes Thumshirn wrote:
When multiple processes write data to the same block group on a compressed
zoned filesystem, the underlying device could report I/O errors and data
corruption is possible.

This happens because on a zoned file system, compressed data writes where
sent to the device via a REQ_OP_WRITE instead of a REQ_OP_ZONE_APPEND
operation. But with REQ_OP_WRITE and parallel submission it cannot be
guaranteed that the data is always submitted aligned to the underlying
zone's write pointer.

The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a zoned
filesystem is non intrusive on a regular file system or when submitting to
a conventional zone on a zoned filesystem, as it is guarded by
btrfs_use_zone_append.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag")
Signed-off-by: Johannes Thumshirn <redacted>
Now working on compression support for subpage, just noticed some
strange code behavior, I'm not sure if it's designed or just a typo.

So please correct me if possible.

[...]
  	bio = btrfs_bio_alloc(first_byte);
-	bio->bi_opf = REQ_OP_WRITE | write_flags;
+	bio->bi_opf = bio_op | write_flags;
  	bio->bi_private = cb;
  	bio->bi_end_io = end_compressed_bio_write;

+	if (use_append) {
+		struct extent_map *em;
+		struct map_lookup *map;
+		struct block_device *bdev;
+
+		em = btrfs_get_chunk_map(fs_info, disk_start, PAGE_SIZE);
+		if (IS_ERR(em)) {
+			kfree(cb);
+			bio_put(bio);
+			return BLK_STS_NOTSUPP;
+		}
+
+		map = em->map_lookup;
+		/* We only support single profile for now */
+		ASSERT(map->num_stripes == 1);
+		bdev = map->stripes[0].dev->bdev;
+
+		bio_set_dev(bio, bdev);
+		free_extent_map(em);
+	}
+
Here for the newly created bio, we will try to call bio_set_dev() for
it. (although later patch refactor this part a little)

So far so good.
quoted hunk ↗ jump to hunk
  	if (blkcg_css) {
  		bio->bi_opf |= REQ_CGROUP_PUNT;
  		kthread_associate_blkcg(blkcg_css);
@@ -432,6 +458,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
  	bytes_left = compressed_len;
  	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
  		int submit = 0;
+		int len;

  		page = compressed_pages[pg_index];
  		page->mapping = inode->vfs_inode.i_mapping;
@@ -439,9 +466,13 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
  			submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio,
  							  0);

+		if (pg_index == 0 && use_append)
+			len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
+		else
+			len = bio_add_page(bio, page, PAGE_SIZE, 0);
+
  		page->mapping = NULL;
-		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
-		    PAGE_SIZE) {
+		if (submit || len < PAGE_SIZE) {
  			/*
  			 * inc the count before we submit the bio so
  			 * we know the end IO handler won't happen before
@@ -465,11 +496,15 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
  			}

  			bio = btrfs_bio_alloc(first_byte);
-			bio->bi_opf = REQ_OP_WRITE | write_flags;
+			bio->bi_opf = bio_op | write_flags;
But here, for the newly allocated bio, we didn't call bio_set_dev() at all.

Shouldn't all zoned write bio need that bio_set_dev() call?

I guess since most compressed extents are pretty small, it's really hard
to hit a case where we need to split the bio due to stripe boundary,
thus very hard to hit anything wrong.

Anyway, since I'm working on compression code to make compressed write
to follow the same boundary check in extent_io.c, I can definitely
refactor the bio allocation code to add the zoned needed calls.

Thanks,
Qu
  			bio->bi_private = cb;
  			bio->bi_end_io = end_compressed_bio_write;
  			if (blkcg_css)
  				bio->bi_opf |= REQ_CGROUP_PUNT;
+			/*
+			 * Use bio_add_page() to ensure the bio has at least one
+			 * page.
+			 */
  			bio_add_page(bio, page, PAGE_SIZE, 0);
  		}
  		if (bytes_left < PAGE_SIZE) {
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help