Thread (44 messages) 44 messages, 6 authors, 2021-06-24

Re: [PATCH RESEND x3 v9 1/9] iov_iter: add copy_struct_from_iter()

From: Omar Sandoval <osandov@osandov.com>
Date: 2021-06-23 17:49:56
Also in: linux-btrfs, linux-fsdevel

On Wed, Jun 23, 2021 at 08:06:39AM +1000, Dave Chinner wrote:
On Mon, Jun 21, 2021 at 01:55:03PM -0700, Omar Sandoval wrote:
quoted
On Mon, Jun 21, 2021 at 01:46:04PM -0700, Omar Sandoval wrote:
quoted
On Mon, Jun 21, 2021 at 12:33:17PM -0700, Linus Torvalds wrote:
quoted
On Mon, Jun 21, 2021 at 11:46 AM Omar Sandoval [off-list ref] wrote:
quoted
How do we get the userspace size with the encoded_iov.size approach?
We'd have to read the size from the iov_iter before writing to the rest
of the iov_iter. Is it okay to mix the iov_iter as a source and
destination like this? From what I can tell, it's not intended to be
used like this.
I guess it could work that way, but yes, it's ugly as hell. And I
really don't want a readv() system call - that should write to the
result buffer - to first have to read from it.

So I think the original "just make it be the first iov entry" is the
better approach, even if Al hates it.

Although I still get the feeling that using an ioctl is the *really*
correct way to go. That was my first reaction to the series
originally, and I still don't see why we'd have encoded data in a
regular read/write path.

What was the argument against ioctl's, again?
The suggestion came from Dave Chinner here:
https://lore.kernel.org/linux-fsdevel/20190905021012.GL7777@dread.disaster.area/ (local)

His objection to an ioctl was two-fold:

1. This interfaces looks really similar to normal read/write, so we
   should try to use the normal read/write interface for it. Perhaps
   this trouble with iov_iter has refuted that.
2. The last time we had Btrfs-specific ioctls that eventually became
   generic (FIDEDUPERANGE and FICLONE{,RANGE}), the generalization was
   painful. Part of the problem with clone/dedupe was that the Btrfs
   ioctls were underspecified. I think I've done a better job of
   documenting all of the semantics and corner cases for the encoded I/O
   interface (and if not, I can address this). The other part of the
   problem is that there were various sanity checks in the normal
   read/write paths that were missed or drifted out of sync in the
   ioctls. That requires some vigilance going forward. Maybe starting
   this off as a generic (not Btrfs-specific) ioctl right off the bat
   will help.

If we do go the ioctl route, then we also have to decide how much of
preadv2/pwritev2 it should emulate. Should it use the fd offset, or
should that be an ioctl argument? Some of the RWF_ flags would be useful
for encoded I/O, too (RWF_DSYNC, RWF_SYNC, RWF_APPEND), should it
support those? These bring us back to Dave's first point.
Oops, I dropped Dave from the Cc list at some point. Adding him back
now.
Fair summary. The only other thing that I'd add is this is an IO
interface that requires issuing physical IO. So if someone wants
high throughput for encoded IO, we really need AIO and/or io_uring
support, and we get that for free if we use readv2/writev2
interfaces.

Yes, it could be an ioctl() interface, but I think that this sort of
functionality is exactly what extensible syscalls like
preadv2/pwritev2 should be used for. It's a slight variant on normal
IO, and that's exactly what the RWF_* flags are intended to be used
for - allowing interesting per-IO variant behaviour without having
to completely re-implemnt the IO path via custom ioctls every time
we want slightly different functionality...
Al, Linus, what do you think? Is there a path forward for this series as
is? I'd be happy to have this functionality merged in any form, but I do
think that this approach with preadv2/pwritev2 using iov_len is decent
relative to the alternatives.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help