Thread (8 messages) 8 messages, 5 authors, 2014-10-15

Re: BLKZEROOUT + pread should return zeroes, right?

From: Darrick J. Wong <hidden>
Date: 2014-10-14 06:02:42
Also in: linux-fsdevel

On Tue, Oct 14, 2014 at 03:27:11PM +1100, Dave Chinner wrote:
On Mon, Oct 13, 2014 at 08:01:32PM -0700, Darrick J. Wong wrote:
quoted
Hi everyone,

What's the intended behavior if I issue BLKZEROOUT against a range of disk
sectors and immediately re-read the sectors into a buffer?
Should return zeros.

[...]
quoted
I boiled the whole thing down into the attached test program, which can
reproduce the symptoms in a few loop iterations.  If I insert "sleep(1);"
before the pread64, I pread zeroes every time; otherwise, I only pread zeroes
part of the time.  If I call "ioctl(fd, BLKFLSBUF);" before the BLKZEROOUT, the
chances of preading zeroes increases dramatically, but is still not 100%.
Hint #1: buffered IO == data in page cache.
Hint #2: BLKZEROOUT operates at the bio level.
Yeah, I forgot about that little quirk where the page cache is left in the
dark.  Thank you for the sanity check, Dave.
quoted
So, uh, is this a bug?  Or is that just how BLKZEROOUT works?  Or did I fubar
the ioctl call?
Broken usage, IMO. If you are going to use the block layer ioctls to
manipulate data int eh block device, you should be using direct Io
for all your data IO to the block device. Otherwise, coherency
problems occur....
So... if these ioctls require direct IO read and write for any sane use model,
why doesn't the kernel fail the request if the fd isn't in O_DIRECT mode?  Or,
if we do want to allow the ioctls to run on an fd that's opened in buffered IO
mode, can we simply invalidate that part of the page cache after calling
ZEROOUT?

Something idiotic like fsync_bdev() -> blkdev_issue_zeroout -> invalidate_bdev
-> invalidate_inode_pages2 seems to smooth things over, but that's a big dumb
hammer.

Tired of this for now, going to bed.

--D
Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help