Re: [PATCH RESEND x3 v9 1/9] iov_iter: add copy_struct_from_iter()
From: Omar Sandoval <osandov@osandov.com>
Date: 2021-06-21 20:55:09
Also in:
linux-btrfs, linux-fsdevel
On Mon, Jun 21, 2021 at 01:46:04PM -0700, Omar Sandoval wrote:
On Mon, Jun 21, 2021 at 12:33:17PM -0700, Linus Torvalds wrote:quoted
On Mon, Jun 21, 2021 at 11:46 AM Omar Sandoval [off-list ref] wrote:quoted
How do we get the userspace size with the encoded_iov.size approach? We'd have to read the size from the iov_iter before writing to the rest of the iov_iter. Is it okay to mix the iov_iter as a source and destination like this? From what I can tell, it's not intended to be used like this.I guess it could work that way, but yes, it's ugly as hell. And I really don't want a readv() system call - that should write to the result buffer - to first have to read from it. So I think the original "just make it be the first iov entry" is the better approach, even if Al hates it. Although I still get the feeling that using an ioctl is the *really* correct way to go. That was my first reaction to the series originally, and I still don't see why we'd have encoded data in a regular read/write path. What was the argument against ioctl's, again?The suggestion came from Dave Chinner here: https://lore.kernel.org/linux-fsdevel/20190905021012.GL7777@dread.disaster.area/ (local) His objection to an ioctl was two-fold: 1. This interfaces looks really similar to normal read/write, so we should try to use the normal read/write interface for it. Perhaps this trouble with iov_iter has refuted that. 2. The last time we had Btrfs-specific ioctls that eventually became generic (FIDEDUPERANGE and FICLONE{,RANGE}), the generalization was painful. Part of the problem with clone/dedupe was that the Btrfs ioctls were underspecified. I think I've done a better job of documenting all of the semantics and corner cases for the encoded I/O interface (and if not, I can address this). The other part of the problem is that there were various sanity checks in the normal read/write paths that were missed or drifted out of sync in the ioctls. That requires some vigilance going forward. Maybe starting this off as a generic (not Btrfs-specific) ioctl right off the bat will help. If we do go the ioctl route, then we also have to decide how much of preadv2/pwritev2 it should emulate. Should it use the fd offset, or should that be an ioctl argument? Some of the RWF_ flags would be useful for encoded I/O, too (RWF_DSYNC, RWF_SYNC, RWF_APPEND), should it support those? These bring us back to Dave's first point.
Oops, I dropped Dave from the Cc list at some point. Adding him back now.