Re: XFS fallocate implementation incorrectly reports ENOSPC

From: Chris Dunlop <hidden>
Date: 2021-08-27 02:55:42

On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:

On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:

quoted

On 8/25/21 9:06 PM, Chris Dunlop wrote:

quoted

fallocate -l 1GB image.img
mkfs.xfs -f image.img
mkdir mnt
mount -o loop ./image.img mnt
fallocate -o 0 -l 700mb mnt/image.img
fallocate -o 0 -l 700mb mnt/image.img

Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?

Interesting.  Off the top of my head, I assume that xfs is not looking at
current file space usage when deciding how much is needed to satisfy the
fallocate request.  While filesystems can return ENOSPC at any time for
any reason, this does seem a bit suboptimal.

Yes, I would have thought the second fallocate should be a noop.

On further reflection, "filesystems can return ENOSPC at any time" is 
certainly something apps need to be prepared for (and in this case, it's 
doing the right thing, by logging the error and aborting), but it's not 
really a "not a bug" excuse for the filesystem in all circumstances (or 
this one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh 
filesystem would be considered a bug, no?

...or maybe your "suboptimal" was entirely tongue in cheek?

quoted

Background: I'm chasing a mysterious ENOSPC error on an XFS 
filesystem with way more space than the app should be asking for. 
There are no quotas on the fs. Unfortunately it's a third party 
app and I can't tell what sequence is producing the error, but 
this fallocate issue is a possibility.

Presumably you've tried stracing it and looking for ENOSPC returns from
syscalls?

That would be an obvious approach. Unfortunately it's not that easy. 
The problem is associated with one specific client which is out of my 
control so I can't experiment in a controlled environment. The app 
runs for several hours in multiple phases, each with multiple threads, 
and the problem typically occurs in the early hours of the morning 
after several hours of running, so attaching to the correct instance 
is fraught, and the strace output will be voluminous.

I decided to stop being lazy and look into taking the strace option 
further. I can script looking for the right process as it starts up, and 
with judicious use of "-Z" for failed calls only, and filtering out 
commonly failing syscalls (futex, stat etc.), the output volume is reduced 
to just about nothing. This could be the solution - but it'll probably 
take a week or so for it to fail again and see if I can catch what's going 
on.

Thanks for the inspiration / kick in the pants to get this going.

Strace has grown more options since the last time I looked at the man 
page: "-Z" is fantastic!

Cheers,

Chris

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help