Re: [PATCH v11 08/14] btrfs: add BTRFS_IOC_ENCODED_READ
From: Nikolay Borisov <hidden>
Date: 2021-10-15 11:45:50
Also in:
linux-btrfs, linux-fsdevel
On 1.09.21 г. 20:01, Omar Sandoval wrote:
quoted hunk ↗ jump to hunk
From: Omar Sandoval <redacted> There are 4 main cases: 1. Inline extents: we copy the data straight out of the extent buffer. 2. Hole/preallocated extents: we fill in zeroes. 3. Regular, uncompressed extents: we read the sectors we need directly from disk. 4. Regular, compressed extents: we read the entire compressed extent from disk and indicate what subset of the decompressed extent is in the file. This initial implementation simplifies a few things that can be improved in the future: - We hold the inode lock during the operation. - Cases 1, 3, and 4 allocate temporary memory to read into before copying out to userspace. - We don't do read repair, because it turns out that read repair is currently broken for compressed data. Signed-off-by: Omar Sandoval <redacted> --- fs/btrfs/ctree.h | 4 + fs/btrfs/inode.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.c | 111 +++++++++++ 3 files changed, 604 insertions(+)diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b95ec5fb68d5..cbd7e07c1c34 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h@@ -3223,6 +3223,10 @@ int btrfs_writepage_cow_fixup(struct page *page); void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, struct page *page, u64 start, u64 end, bool uptodate); +struct btrfs_ioctl_encoded_io_args; +ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, + struct btrfs_ioctl_encoded_io_args *encoded); + extern const struct dentry_operations btrfs_dentry_operations; extern const struct iomap_ops btrfs_dio_iomap_ops; extern const struct iomap_dio_ops btrfs_dio_ops;diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a87a34f56234..1940f22179ba 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c@@ -10500,6 +10500,495 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) } }
<snip>
+ +static blk_status_t btrfs_encoded_read_check_bio(struct btrfs_io_bio *io_bio)
nit: The gist of this function is to check the csum so how about renaming it to btrfs_encoded_read_verify_csum <snip>
+
+static void btrfs_encoded_read_endio(struct bio *bio)
+{
+ struct btrfs_encoded_read_private *priv = bio->bi_private;
+ struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+ blk_status_t status;
+
+ status = btrfs_encoded_read_check_bio(io_bio);
+ if (status) {
+ /*
+ * The memory barrier implied by the atomic_dec_return() here
+ * pairs with the memory barrier implied by the
+ * atomic_dec_return() or io_wait_event() innit: I think atomic_dec_return in read_regular_fill_pages is inconsequential, what we want to ensure is that when the caller of io_wait_event is woken up by this thread it will observe the priv->status, which it will, because the atomic-dec_return in this function has paired with the general barrier interpolated by wait_event. So for brevity just leave the text to say "by io_wait_event".
+ * btrfs_encoded_read_regular_fill_pages() to ensure that this + * write is observed before the load of status in + * btrfs_encoded_read_regular_fill_pages(). + */ + WRITE_ONCE(priv->status, status); + } + if (!atomic_dec_return(&priv->pending)) + wake_up(&priv->wait); + btrfs_io_bio_free_csum(io_bio); + bio_put(bio); +}
<snip>
quoted hunk ↗ jump to hunk
@@ -4824,6 +4841,94 @@ static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat) return ret; }
<snip>
+ memset((char *)&args + copy_end_kernel, 0, + sizeof(args) - copy_end_kernel);
nit: This memset can be eliminated ( in source) by marking args = {};
and just leaving copy from user above.
+
+ ret = import_iovec(READ, args.iov, args.iovcnt, ARRAY_SIZE(iovstack),
+ &iov, &iter);
+ if (ret < 0)
+ goto out_acct;
+
+ if (iov_iter_count(&iter) == 0) {
+ ret = 0;
+ goto out_iov;
+ }
+ pos = args.offset;
+ ret = rw_verify_area(READ, file, &pos, args.len);
+ if (ret < 0)
+ goto out_iov;
+
+ init_sync_kiocb(&kiocb, file);
+ ret = kiocb_set_rw_flags(&kiocb, 0);This call is a noop due to: if (!flags) return 0; in kiocb_set_rw_flags. <snip>