Thread (16 messages) 16 messages, 4 authors, 2014-03-23

Re: [RFC PATCH 0/5] userspace PI passthrough via AIO/DIO

From: Zach Brown <hidden>
Date: 2014-03-22 00:00:42
Also in: linux-fsdevel, linux-scsi

On Fri, Mar 21, 2014 at 03:20:25PM -0700, Darrick J. Wong wrote:
On Fri, Mar 21, 2014 at 11:23:32AM -0700, Zach Brown wrote:
quoted
On Thu, Mar 20, 2014 at 09:30:41PM -0700, Darrick J. Wong wrote:
quoted
This RFC provides a rough implementation of a mechanism to allow
userspace to attach protection information (e.g. T10 DIF) data to a
disk write and to receive the information alongside a disk read.  The
interface is an extension to the AIO interface: two new commands
(IOCB_CMD_P{READ,WRITE}VM) are provided.  The last struct iovec in the
arg list is interpreted to point to a buffer containing a header,
followed by the the PI data.
Instead of adding commands that indicate that the final element is a
magical pi buffer, why not expand the iocb?

In the user iocb, a bit in aio_flags could indicate that aio_reserved2
is a pointer to an extension of the iocb.  In that extension could be a
full iov *, nr_segs for PI data.

You'd then translate that into a bigger kernel kiocb with a specific
pointer to PI data rather than having to bubble the tests for this magic
final iovec down through the kernel.

+       if (iocb->ki_flags & KIOCB_USE_PI) {
+               nr_segs--;
+               pi_iov = (struct iovec *)(iov + nr_segs);
+       }

I suggest this because there's already pressure to extend the iocb.
Folks want io priority inputs, completion time outputs, etc.
I'm curious about the reqprio field -- it seems like it was put there to
request some kind of IO priority change, but the kernel doesn't use it.
The user-facing iocbs were derived from the posix aio interface which
has a reqprio field (aio(7), aio_reqprio).  I don't think anything's
ever been done with it.

I don't know more about what current io prio stuff people might want to
specify..  ioprio_set(2) args instead of having to bounce through
syscalls and current-> for each op?  cgroup bits?  No idea.
If aio_reserved2 becomes a (flag-guarded) pointer to an array of aio
extensions, I'd be tempted to reuse the reqprio to signal the length of the
extension array, and if anyone wants to start using reqprio, they could add it
as an extension.
I'll admit, I'm hesitant to cannibalize reqprio for this.  It's a lame
s16.  But maybe it'll be the least awful alternative.
(More about this in my response to Ben LaHaise.)
(I'll go reply over there too.)
quoted
And heck, on the sync rw syscall side, add variant that have a pointer
to this same extension struct.  There's nothing inherently aio specific
about having lots more per-io inputs and outputs.
I'm curious -- what kinds of extensions do you envision for sync()?
Sorry, that was poorly worded.  By 'sync' I meant the synchronous
classic sys_*write* syscalls.  Maybe we should add another variant with
a "struct io_goo *" pointer, or whatever.

- z

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help