Re: [PATCH v5 1/10] fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

From: Dave Chinner <david@fromorbit.com>
Date: 2014-02-26 01:57:47
Also in: linux-ext4, linux-fsdevel, linux-xfs, lkml

On Tue, Feb 25, 2014 at 03:41:20PM -0800, Hugh Dickins wrote:

On Mon, 24 Feb 2014, Dave Chinner wrote:

quoted

On Sat, Feb 22, 2014 at 09:06:25AM -0500, Theodore Ts'o wrote:

quoted

On Wed, Feb 19, 2014 at 01:37:43AM +0900, Namjae Jeon wrote:

quoted

+	/*
+	 * There is no need to overlap collapse range with EOF, in which case
+	 * it is effectively a truncate operation
+	 */
+	if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
+	    (offset + len >= i_size_read(inode)))
+		return -EINVAL;
+

I wonder if we should just translate a collapse range that is
equivalent to a truncate operation to, in fact, be a truncate
operation?

Trying to collapse a range that extends beyond EOF, IMO, is likely
to only happen if the DVR/NLE application is buggy. Hence I think
that telling the application it is doing something that is likely to
be wrong is better than silently truncating the file....

I do agree with Ted on this point.  This is not an xfs ioctl added
for one DVR/NLE application, it's a mode of a Linux system call.

We do not usually reject with an error when one system call happens
to ask for something which can already be accomplished another way;
nor nanny our callers.

It seems natural to me that COLLAPSE_RANGE should support beyond EOF;
unless that adds significantly to implementation difficulties?

Yes, it does add to the implementation complexity significantly - it
adds data security issues that don't exist with the current API.

That is, Filesystems can have uninitialised blocks beyond EOF so
if we allow COLLAPSE_RANGE to shift them down within EOF, we now
have to ensure they are properly zeroed or marked as unwritten.

It also makes implementations more difficult. For example, XFS can
also have in-memory delayed allocation extents beyond EOF, and they
can't be brought into the range < EOF without either:

	a) inserting zeroed pages with appropriately set up
	and mapped bufferheads into the page cache for the range
	that sits within EOF; or
	b) truncating the delalloc extents beyond EOF before the
	move

So, really, the moment you go beyond EOF filesystems have to do
quite a bit more validation and IO in the context of the system
call. It no longer becomes a pure extent manipulation offload - it
becomes a data security problem.

And, indeed, the specification that we are working to is that the
applications that want to collapse the range of a file are using
this function instead of read/memcpy/write/truncate, which by
definition means they cannot shift ranges of the file beyond EOF
into the new file.

So IMO the API defines the functionality as required by the
applications that require it and *no more*. If you need some
different behaviour - we can add it via additional flags in future
when you have an application that requires it.

Actually, is it even correct to fail at EOF?  What if fallocation
with FALLOC_FL_KEEP_SIZE was used earlier, to allocate beyond EOF:
shouldn't it be possible to shift that allocation down, along with
the EOF, rather than leave it behind as a stranded island?

It does get shifted down - it just remains beyond EOF, just like it
was before the operation. And that is part of the specification of
COLLAPSE_RANGE - it was done so that preallocation (physical or
speculative delayed allocation) beyond EOF to avoid fragmentation as
the DVR continues to write is not screwed up by chopping out earlier
parts of the file.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help