Thread (26 messages) 26 messages, 6 authors, 2023-02-21

Re: git regression failures with v6.2-rc NFS client

From: Benjamin Coddington <hidden>
Date: 2023-02-03 17:15:14

On 3 Feb 2023, at 10:35, Chuck Lever III wrote:
quoted
On Feb 3, 2023, at 10:13 AM, Benjamin Coddington [off-list ref] wrote:

On 3 Feb 2023, at 9:38, Chuck Lever III wrote:
quoted
quoted
On Feb 1, 2023, at 10:53 AM, Benjamin Coddington [off-list ref] wrote:

On 1 Feb 2023, at 9:10, Benjamin Coddington wrote:
quoted
Working on a fix..
.. actually, I have no idea how to fix this - if tmpfs is going to modify
the position of its dentries, I can't think of a way for the client to loop
through getdents() and remove every file reliably.

The patch you bisected into just makes this happen on directories with 18
entries instead of 127 which can be verified by changing COUNT in the
reproducer.

As Trond pointed out in:
https://lore.kernel.org/all/eb2a551096bb3537a9de7091d203e0cbff8dc6be.camel@hammerspace.com/ (local)

  POSIX states very explicitly that if you're making changes to the
  directory after the call to opendir() or rewinddir(), then the
  behaviour w.r.t. whether that file appears in the readdir() call is
  unspecified. See
  https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html

The issue here is not quite the same though, we unlink the first batch of
entries, then do a second getdents(), which returns zero entries even though
some still exist.  I don't think POSIX talks about this case directly.

I guess the question now is if we need to drop the "ls -l" improvement
because after it we are going to see this behavior on directories with >17
entiries instead of >127 entries.
I don't have any suggestions about how to fix your optimization.
I wasn't trying to fix it.  I was trying to fix your testing setup.
quoted
Technically I think this counts as a regression; Thorsten seems
to agree with that opinion. It's late in the cycle, so it is
appropriate to consider reverting 85aa8ddc3818 and trying again
in v6.3 or v6.4.
Thorsten's bot is just scraping your regression report email, I doubt
they've carefully read this thread.
quoted
quoted
It should be possible to make tmpfs (and friends) generate reliable cookies
by doing something like hashing out the cursor->d_child into the cookie
space.. (waving hands)
Sure, but what if there are non-Linux NFS-exported filesystems
that behave this way?
Then they would display this same behavior, and claiming it is a server bug
might be defensible position.
It's a server bug if we can cite something (perhaps less confusing
and more on-point than the POSIX specification) that says READDIR
cookies aren't supposed to behave this way. I bet the tmpfs folks
are going to want to see that kind of mandate before allowing a
code change.
I don't have other stuff to cite for you.  All I have is POSIX, and the many
discussions we've had on the linux-nfs list in the past.
I'm wondering if you can trigger the same behavior when running
directly on tmpfs?
Not in the same way, because libfs keeps a cursor in the open directory
context, but knfsd has to do a seekdir between replies to READDIR.  So, if
you do what the git test is doing locally, it works locally.

The git test is doing:

opendir
while (getdents)
    unlink(dentries)
close dir
rmdir <- ENOTEMPTY on NFS

If you have NFS in the middle of that, knfsd tries to do a seekdir to a
position (the cookie/offset sent in the /last/ READDIR results).  In that
case, yes - tmpfs would display the same behavior, but locally to tmpfs that
looks like:

opendir
getdents
closedir
unlink,unlink,unlink
opendir
seekdir to next batch
getdents <- no dentries (they all shifted positions)
rmdir <- ENOTEMPTY

The other way to think about this is that on a local system, there's state
saved in the open dir file structure between calls to getdents().  With
knfsd in the middle, you lose that state when you close the directory and
have to do seekdir instead, which means you're not guaranteed to be in the
same place in the directory stream.
quoted
The reality as I understand it is that if the server is going to change the
cookie or offset of the dentries in between calls to READDIR, you're never
going to reliably be able to list the directory completely.  Or maybe we
can, but at least I can't think of any way it can be done.

You can ask Trond/Anna to revert this, but that's only going to fix your
test setup.  The behavior you're claiming is a regression is still there -
just at a later offset.
No-one is complaining about the existing situation, which
suggests this is currently only a latent bug, and harmless in
practice. This is a regression because your optimization exposes
the misbehavior to more common workloads.
Its not a latent bug - the incorrect readdir behavior of tmpfs has been
analyzed and discussed on this list, have a look.
Even if this is a server bug, the guidelines about not
introducing behavior regressions mean we have to stick with
the current client side behavior until the server side part
of the issue has been corrected.

quoted
Or we can modify the server to make tmpfs and friends generate stable
cookies/offsets.

Or we can patch git so that it doesn't assume it can walk a directory
completely while simultaneously modifying it.
I'm guessing that is something that other workloads might
do, so fixing git is not going to solve the issue. And also,
the test works fine on other filesystem types, so it's not
git that is the problem.

quoted
What do you think?
IMO, since the situation is not easy or not possible to fix,
you should revert 85aa8ddc3818 for v6.2 and work on fixing
tmpfs first.

It's going to have to be a backportable fix because your
optimization will break with any Linux server exporting an
unfixed tmpfs.
Exports of tmpfs on linux are already broken and seem to always have been.
I spent a great deal of time making points about it and arguing that the
client should try to work around them, and ultimately was told exactly this:
https://lore.kernel.org/linux-nfs/a9411640329ed06dd7306bbdbdf251097c5e3411.camel@hammerspace.com/ (local)

The optimization you'd like to revert fixes a performance regression on
workloads across exports of all filesystems.  This is a fix we've had many
folks asking for repeatedly.  I hope the maintainers decide not to revert
it, and that we can also find a way to make readdir work reliably on tmpfs.

Ben
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help