Thread (16 messages) 16 messages, 5 authors, 2014-02-27

Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

From: Dave Chinner <david@fromorbit.com>
Date: 2014-02-26 03:42:30
Also in: linux-fsdevel, linux-mm, linux-xfs, lkml

On Tue, Feb 25, 2014 at 05:52:16PM -0800, Andrew Morton wrote:
On Wed, 26 Feb 2014 12:34:26 +1100 Dave Chinner [off-list ref] wrote:
quoted
On Tue, Feb 25, 2014 at 03:41:28PM -0800, Andrew Morton wrote:
quoted
On Tue, 25 Feb 2014 15:23:35 -0800 (PST) Hugh Dickins [off-list ref] wrote:
quoted
On Tue, 25 Feb 2014, Dave Chinner wrote:
quoted
On Tue, Feb 25, 2014 at 02:16:01PM +1100, Stephen Rothwell wrote:
quoted
On Mon, 24 Feb 2014 11:57:10 +1100 Dave Chinner [off-list ref] wrote:
FALLOC_FL_COLLAPSE_RANGE: I'm a little sad at the name COLLAPSE,
but probably seven months too late to object.  It surprises me that
you're doing all this work to deflate a part of the file, without
the obvious complementary work to inflate it - presumably all those
advertisers whose ads you're cutting out, will come back to us soon
to ask for inflation, so that they have somewhere to reinsert them ;)
Yes, I was wondering that.  Why not simply "move these blocks from here
to there".
And open a completely unnecessary can of worms to do with
behavioural and implementation corner cases?
But it's general.
Exactly. And because it's general, you can't make arbitrary
decisions about the behaviour.
quoted
Do you allow it to destroy data by default? Or only allow moves into
holes?
Overwrite.
Application dev says: "I don't want it to overwrite data - I only
want to it succeed if it's moving into a hole that I've already
prepared".
quoted
What do you do with range the data is moved out of? Does it just
become a hole? What happens if the range overlaps EOF - does that
change the file size?
Truncate.
A.D. says: "But I need FALLOC_FL_KEEP_SIZE semantics"
quoted
What if you want to move the range beyond EOF?
Extend.
Filesystem developer says: "Ok, so what happens to the range between
the old EOF and destintation offset? What do you do with blocks
beyond EOF that fall within that range? punch, zero, preallocate the
entire range? Do users need to be able to specify this behaviour?
Hell, do we even know of an application that requires this
behaviour?"
quoted
What if the source and destination ranges overlap?
Don't screw it up.
Exactly my point - it's a complex behaviour that is difficult to
verify that it is correct.
quoted
What happens when you move the block at EOF into the middle of a
file - do you end up with zeros padding the block and the file size
having to be adjusted accordingly? Or do we have to *copy* all the
data in high blocks down to fill the hole in the block?
I don't understand that.  Move the block(s) and truncate to the new
length.
So, you are saying this (move from s to d):

     +-----------------------------------------------------+
                                             +sssssssssssss+
              +ddddddddddddd+

should result in:

     +--------+ddddddddddddd+


A.D. says: "That's not what I asked for! What happened to all the
rest of my data in the file between d and s? I didn't ask for them
to be removed. And I want a hole where the source was!"
quoted
What behaviour should we expect if the filesystem can't implement
the entire move atomically and we crash in the middle of the move?
What does collapse_range do now?

If it's a journaled filesystem, it shouldn't screw up.  If it isn't, fsck.
Define "screw up". For journalled filesystems "don't screw up" means
the filesystem will be consistent after a crash, not that a change
made in a syscall is completed atomicly.

Indeed, collapse range isn't implemented atomically in XFS, and I
doubt it is in ext4. Why? Because the extent tree being manipulated
can be *much* larger than the journal and so the changes can't
easily be done atomically from a crash recovery perspective. The
result is that collapse range will end up with a hole somewhere in
the file the size of the range being collapsed. This was pointed out
during review some time in the past 6 months and, IIRC, the response
was "that's fine, just so long as the filesystem is not corrupted".
I have plans to fix this issue in XFS, but it isn't critical to the
correct functioning of devices using collapse range.

This just illustrates my point is that behaviour needs to be
specified so that we can get all filesystems with the same minimum
crash guarantees....
quoted
I can keep going, but I'll stop here - you get the idea.
None of this seems like rocket science.
It's not rocket science, but the devil is in the details. There's no
requirements or specification to work from, let alone an application
that needs such generic functionality. Until these exist and there's
someone willing to put the effort into specifying, implementing and
testing such an interface, it's just not going to happen.
quoted
In comparison, collapse range as a file data manipulation has very
specific requirements and from that we can define a simple, specific
API that allows filesystems to accelerate that operation by extent
manipulation rather than read/memcpy/write that the applications are
currently doing for this operation....  IOWs, collapse range is a
simple operation, "move arbitrary blocks from here to there" is a
nightmare both from the specification and the implementation points
of view.
collapse_range seems weird, arbitrary and half-assed.  "Why didn't they
go all the way and do it properly".
Yup, I can apply exactly the same argument to FALLOC_FL_PREALLOC and
FALLOC_FL_PUNCH_HOLE. We should have done them as part of a generic
block movement API as they are simply degenerate cases of NULL
source/destination targets for the block movement API....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help