Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug
From: Kent Overstreet <hidden>
Date: 2012-10-10 00:47:52
Also in:
dm-devel, lkml
On Tue, Oct 09, 2012 at 05:26:34PM -0700, Zach Brown wrote:
quoted
The AIO ringbuffer stuff just annoys me more than mostNot more than everyone, though, I can personally promise you that :).quoted
(it wasn't until the other day that I realized it was actually exported to userspace... what led to figuring that out was noticing aio_context_t was a ulong, and got truncated to 32 bits with a 32 bit program running on a 64 bit kernel. I'd been horribly misled by the code comments and the lack of documentation.)Yeah. It's the userspace address of the mmaped ring. This has annoyed the process migration people who can't recreate the context in a new kernel because there's no userspace interface to specify creation of a context at a specific address.
Yeah I did finally figure that out - and a file descriptor that userspace then mmap()ed would solve that problem...
quoted
But if we do have an explicit handle, I don't see why it shouldn't be a file descriptor.Because they're expensive to create and destroy when compared to a single system call. Imagine that we're using waiting for a single completion to implement a cheap one-off sync call. Imagine it's a buffered op which happens to hit the cache and is really quick.
True. But that could be solved with a separate interface that either doesn't use a context to submit a call synchronously, or uses an implicit per thread context.
(And they're annoying to manage: libraries and O_CLOEXEC, running into fd/file limit tunables, bleh.)
I don't have a _strong_ opinion there, but my intuition is that we shouldn't be creating new types of handles without a good reason. I don't think the annoyances are for the most part particular to file descriptors, I think the tend to be applicable to handles in general and at least with file descriptors they're known and solved. Also, with a file descriptor it naturally works with an epoll event loop. (eventfd for aio is a hack).
If the 'completion context' is no more than a structure in userspace memory then a lot of stuff just works. Tasks can share it amongst themselves as they see fit. A trivial one-off sync call can just dump it on the stack and point to it. It doesn't have to be specifically torn down on task exit.
That would be awesome, though for it to be worthwhile there couldn't be any kernel notion of a context at all and I'm not sure if that's practical. But the idea hadn't occured to me before and I'm sure you've thought about it more than I have... hrm. Oh hey, that's what acall does :P For completions though you really want the ringbuffer pinned... what do you do about that?
quoted
quoted
And perhaps obviously, I'd start with the acall stuff :). It was a lot lighter. We could talk about how to make it extensible without going all the way to the generic packed variable size duplicating or not and returning or not or.. attributes :).Link? I haven't heard of acall before.I linked to it after that giant silly comment earlier in the thread, here it is again: http://lwn.net/Articles/316806/
Oh whoops, hadn't started reading yet - looking at it now :)
There's a mostly embarassing video of a jetlagged me giving that talk at LCA kicking around.. ah, here: http://mirror.linux.org.au/pub/linux.conf.au/2009/Thursday/131.ogg - z