Re: [PATCH 1/6] fs: add hole punching to fallocate
From: Dave Chinner <david@fromorbit.com>
Date: 2011-01-12 11:48:58
Also in:
linux-btrfs, linux-xfs, lkml
On Tue, Jan 11, 2011 at 04:30:07PM -0500, Ted Ts'o wrote:
On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:quoted
quoted
IOWs, all they want to do is avoid the unwritten extent conversion overhead. Time has shown that a bad security/performance tradeoff decision was made 13 years ago in XFS, so I see little reason to repeat it for ext4 today....I suspect things may have changed somewhat; both in terms of requirements and nature of cluter file systems, and the performance of various storage systems (including PCIe-attached flash devices).
We can throw 1000x more CPU power and memory at the problem than we could 13 years ago. IOW the system balance hasn't changed (even considering pci-e SSDs) compared to 13 years. Hence if it was a bad tradeoff 13 years ago, it's still a bad tradeoff today.
quoted
I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead of extent conversion. It's that extent conversion causes more metadata operations than what you'd have otherwise, which means systems that want to use O_DIRECT and make sure the data doesn't go away either have to write O_DIRECT|O_DSYNC or need to call fdatasync(). cluster file system implementor,One possibility might be to make it an optional feature which is only enabled via a mount option. That way someone would have to explicit ask for this feature two ways (via a new flag to fallocate) and a mount option.
Proliferation of mount options just to enable feature X of API Y for filesystem Z is not a good idea. Either you enable it via the fallocate API or you don't allow it at all.
It might not make sense for XFS, but for people who are using ext4 as the local storage file system back-end,
How does this differ from a local filesystem? Are you talking about storage nodes for clustered/cloudy storage? If so, I know of quite a few places that use XFS for this purpose and they all seem to measure storage in petabytes made up of small boxes containing anywhere between 30-100TB each. The only request for additional preallocation functionality I've got from people running such applications recently is for XFS_IOC_ZERO_RANGE. This is quite relevant, because that specifically converts allocated extents to unwritten extents. i.e. they like to be able to efficiently re-initialise allocated space to zeros rather than have it contain stale data.
and are doing all sorts of things to get the best performance, including disabling the journal, I suspect it really would make sense.
That's not really a convincing argument for a new interface that needs to be maintained forever.
So it could always be an optional-to-implement flag, that not all file systems should feel obliged to implement.
It could, but it still needs better justification. Cheers, Dave. -- Dave Chinner david@fromorbit.com