Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
From: David Chinner <hidden>
Date: 2007-05-04 07:27:42
Also in:
linux-fsdevel, linux-xfs, lkml
On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote:
On Fri, 4 May 2007 16:07:31 +1000 David Chinner [off-list ref] wrote:quoted
On Thu, May 03, 2007 at 09:29:55PM -0700, Andrew Morton wrote:quoted
On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" [off-list ref] wrote:quoted
This patch implements the fallocate() system call and adds support for i386, x86_64 and powerpc. ... +{ + struct file *file; + struct inode *inode; + long ret = -EINVAL; + + if (len == 0 || offset < 0) + goto out;The posix spec implies that negative `len' is permitted - presumably "allocate ahead of `offset'". How peculiar.I just checked the man page for posix_fallocate() and it says: EINVAL offset or len was less than zero. We should probably follow this lead.Yes, I think so. I'm suspecting that http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html is just buggy. Or I can't read. I mean, if we're going to support negative `len' then is the byte at `offset' inside or outside the segment? Head spins.
I don't think we should care. If we provide a syscall with the semantics of "allocate from offset to offset+len" then glibc's implementation can turn negative length into two separate fallocate syscalls....
quoted
quoted
quoted
+ ret = -ENODEV; + if (!S_ISREG(inode->i_mode)) + goto out_fput;So we return ENODEV against an S_ISBLK fd, as per the posix spec. That seems a bit silly of them.Hmmmm - I thought that the intention of sys_fallocate() was to be generic enough to eventually allow preallocation on directories. If that is the case, then this check will prevent that....The above opengroup page only permits S_ISREG. Preallocating directories sounds quite useful to me, although it's something which would be pretty hard to emulate if the FS doesn't support it. And there's a decent case to be made for emulating it - run-anywhere reasons. Does glibc emulation support directories? Quite unlikely. But yes, sounds like a desirable thing. Would XFS support it easily if the above check was relaxed?
No - right now empty blocks are pruned from the directory immediately so I don't think we really have a concept of empty blocks in the btree structure. dir2 is bloody complex, so adding preallocation is probably not going to be simple to do. It's not high on my list to add, either, because we can typically avoid the worst case directory fragmentation by using larger directory block sizes (e.g. 16k instead of the default 4k on a 4k block size fs). IIRC directory preallocation has been talked about more for ext3/4.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group