Re: [PATCH 07/13] aio: enabled thread based async fsync

[PATCH 00/13] aio: thread (work queue) based aio and new aio functionality · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 01/13] signals: distinguish signals sent due to i/o via io_send_sig() · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 02/13] aio: add aio_get_mm() helper · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 03/13] aio: for async operations, make the iter argument persistent · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 04/13] signals: add and use aio_get_task() to direct signals sent via io_send_sig() · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 05/13] fs: make do_loop_readv_writev() non-static · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 06/13] aio: add queue_work() based threaded aio support · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-15
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Theodore Ts'o <tytso@mit.edu> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-23
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-23
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-23
Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-23
aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-03-14
Re: aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-03-20
Re: aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Al Viro <viro@ZenIV.linux.org.uk> · 2016-03-20
Re: aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-03-20
Re: aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Al Viro <viro@ZenIV.linux.org.uk> · 2016-03-20
Re: aio openat Re: [PATCH 07/13] aio: enabled thread based async fsync · Linus Torvalds <torvalds@linux-foundation.org> · 2016-03-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Dave Chinner <david@fromorbit.com> · 2016-01-20
Re: [PATCH 07/13] aio: enabled thread based async fsync · Andres Freund <hidden> · 2016-01-22
Re: [PATCH 07/13] aio: enabled thread based async fsync · Andy Lutomirski <luto@amacapital.net> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Paolo Bonzini <pbonzini@redhat.com> · 2016-01-14
Re: [PATCH 07/13] aio: enabled thread based async fsync · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-12
Re: [PATCH 07/13] aio: enabled thread based async fsync · Andres Freund <hidden> · 2016-01-22
[PATCH 08/13] aio: add support for aio poll via aio thread helper · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 09/13] aio: add support for async openat() · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
Re: [PATCH 09/13] aio: add support for async openat() · Linus Torvalds <torvalds@linux-foundation.org> · 2016-01-12
Re: [PATCH 09/13] aio: add support for async openat() · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-12
Re: [PATCH 09/13] aio: add support for async openat() · Chris Mason <clm@fb.com> · 2016-01-12
Re: [PATCH 09/13] aio: add support for async openat() · Ingo Molnar <mingo@kernel.org> · 2016-01-12
[PATCH 10/13] aio: add async unlinkat functionality · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 11/13] mm: enable __do_page_cache_readahead() to include present pages · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 12/13] aio: add support for aio readahead · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11
[PATCH 13/13] aio: add support for aio renameat operation · Benjamin LaHaise <bcrl@kvack.org> · 2016-01-11

From: Dave Chinner <hidden>
Date: 2016-01-23 22:22:57
Also in: linux-fsdevel, linux-mm, lkml

On Fri, Jan 22, 2016 at 11:50:24PM -0500, Benjamin LaHaise wrote:

On Sat, Jan 23, 2016 at 03:24:49PM +1100, Dave Chinner wrote:

quoted

On Wed, Jan 20, 2016 at 04:56:30PM -0500, Benjamin LaHaise wrote:

quoted

On Thu, Jan 21, 2016 at 08:45:46AM +1100, Dave Chinner wrote:

quoted

Filesystems *must take locks* in the IO path. We have to serialise
against truncate and other operations at some point in the IO path
(e.g. block mapping vs concurrent allocation and/or removal), and
that can only be done sanely with sleeping locks.  There is no way
of knowing in advance if we are going to block, and so either we
always use threads for IO submission or we accept that occasionally
the AIO submission will block.

I never said we don't take locks.  Still, we can be more intelligent 
about when and where we do so.  With the nonblocking pread() and pwrite() 
changes being proposed elsewhere, we can do the part of the I/O that 
doesn't block in the submitter, which is a huge win when possible.

As it stands today, *every* buffered write takes i_mutex immediately 
on entering ->write().  That one issue alone accounts for a nearly 10x 
performance difference between an O_SYNC write and an O_DIRECT write,

Yes, that locking is for correct behaviour, not for performance
reasons.  The i_mutex is providing the required semantics for POSIX
write(2) functionality - writes must serialise against other reads
and writes so that they are completed atomically w.r.t. other IO.
i.e. writes to the same offset must not interleave, not should reads
be able to see partial data from a write in progress.

No, the locks are not *required* for POSIX semantics, they are a legacy
of how Linux filesystem code has been implemented and how we ensure the
necessary internal consistency needed inside our filesystems is
provided.

That may be the case, but I really don't see how you can provide
such required functionality without some kind of exclusion barrier
in place. No matter how you implement that exclusion, it can be seen
effectively as a lock.

Even if the filesystem doesn't use the i_mutex for exclusion to the
page cache, it has to use some kind of lock as that IO still needs
to be serialised against any truncate, hole punch or other extent
manipulation that is currently in progress on the inode...

There are other ways to achieve the required semantics that
do not involve a single giant lock for the entire file/inode.

Most performant filesystems don't have a "single giant lock"
anymore. The problem is that the VFS expects the i_mutex to be held
for certain operations in the IO path and the VFS lock order
heirarchy makes it impossible to do anything but "get i_mutex
first".  That's the problem that needs to be solved - the VFS
enforces the "one giant lock" model, even when underlying
filesystems do not require it.

i.e. we could quite happily remove the i_mutex completely from the XFS
buffered IO path without breaking anything, but we can't because
that results in the VFS throwing warnings that we don't hold the
i_mutex (e.g like when removing the SUID bits on write). So there's
lots of VFS functionality that needs to be turned on it's head
before the i_mutex can be removed from the IO path.

And no, I
am not saying that doing this is simple or easy to do.

Sure. That's always been the problem. Even when a split IO/metadata
locking strategy like what XFS uses (and other modern filesystems
are moving to internally) is suggested as a model for solving
these problems, the usual response instant dismissal with
"no way, that's unworkable" and so nothing ever changes...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help