Re: git regression failures with v6.2-rc NFS client
From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2023-02-03 14:38:39
On Feb 1, 2023, at 10:53 AM, Benjamin Coddington [off-list ref] wrote: On 1 Feb 2023, at 9:10, Benjamin Coddington wrote:quoted
Working on a fix.... actually, I have no idea how to fix this - if tmpfs is going to modify the position of its dentries, I can't think of a way for the client to loop through getdents() and remove every file reliably. The patch you bisected into just makes this happen on directories with 18 entries instead of 127 which can be verified by changing COUNT in the reproducer. As Trond pointed out in: https://lore.kernel.org/all/eb2a551096bb3537a9de7091d203e0cbff8dc6be.camel@hammerspace.com/ (local) POSIX states very explicitly that if you're making changes to the directory after the call to opendir() or rewinddir(), then the behaviour w.r.t. whether that file appears in the readdir() call is unspecified. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html The issue here is not quite the same though, we unlink the first batch of entries, then do a second getdents(), which returns zero entries even though some still exist. I don't think POSIX talks about this case directly. I guess the question now is if we need to drop the "ls -l" improvement because after it we are going to see this behavior on directories with >17 entiries instead of >127 entries.
I don't have any suggestions about how to fix your optimization. Technically I think this counts as a regression; Thorsten seems to agree with that opinion. It's late in the cycle, so it is appropriate to consider reverting 85aa8ddc3818 and trying again in v6.3 or v6.4.
It should be possible to make tmpfs (and friends) generate reliable cookies by doing something like hashing out the cursor->d_child into the cookie space.. (waving hands)
Sure, but what if there are non-Linux NFS-exported filesystems that behave this way? -- Chuck Lever