Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION... | linux-mm

Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range

From: Dave Chinner <david@fromorbit.com>
Date: 2012-11-26 21:28:45
Also in: linux-fsdevel

On Mon, Nov 26, 2012 at 03:13:08PM -0500, Christoph Hellwig wrote:

On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:

quoted

Gosh, that's a very sudden new consensus.  The consensus over the past
ten or twenty years has been that the Linux kernel enforce locking for
consistent atomic writes, but skip that overhead on reads - hasn't it?

I'm not sure there was much of a consensus ever.  We XFS people always
ttried to push everyone down the strict rule, but there was enough
pushback that it didn't actually happen.

quoted

Thanks, that's helpful; but I think linux-mm people would want to defer
to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
but a fundamental change to VFS locking philosophy is not mm's call.

I don't see that page locking would have anything to do with it: if we
are going to start guaranteeing reads atomic against concurrent writes,
then surely it's the size requested by the user to be guaranteed,
spanning however many pages and fs-blocks: i_mutex, or a more
efficiently crafted alternative.

What XFS does is simply replace (or rather augment currently) i_mutex
with a rw_semaphore (i_iolock in XFS) which is used the following way:

exclusive:
 - buffer writes
 - pagecache flushing before direct I/O (then downgraded)
 - appending direct I/O writes
 - less than blocksize granularity direct I/O

   - splice write

Also, direct extent manipulations that are outside the IO path such
as:
   - truncate
   - preallocation
   - hole punching

use the XFS_IOLOCK_EXCL to provide exclusion against new IO starting
while such an operation is in progress.

shared:
 - everything else (buffered reads, "normal" direct I/O)

Doing this in the highest levels of the generic_file_ code would be
trivial, and would allow us to get rid of a fair chunk of wrappers in
XFS.

We still need the iolock deep in the guts of the filesystem, though.

I suspect that if we are going to change the VFS locking, then we
should seriously consider allowing the filesystem to provide it's
own locking implementation and the VFS just pass the type of lock
required. Otherwise we are still going to need all the locking
within the filesystem to serialise all the core pieces that the VFS
locking doesn't serialise (e.g. EOF truncation on close/evict,
extent swaps for online defrag, etc).

Note that we've been thinking about replacing this lock with a range
lock, but this will require more research.

I'd say we need a working implementation in a filesystem before even
considering a VFS implementation...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help