Thread (8 messages) 8 messages, 2 authors, 2024-06-17

Re: reservation errors during fstests on pNFS block

From: Chuck Lever III <hidden>
Date: 2024-06-14 17:46:28

On Jun 14, 2024, at 12:38 PM, Christoph Hellwig [off-list ref] wrote:

On Fri, Jun 14, 2024 at 02:46:49PM +0000, Chuck Lever III wrote:
quoted
I've finally gotten kdevops and pNFS block to the point where
it can run fstests smoothly with an iSCSI target. I'm seeing
error messages on occasion in the system journal. This set is
from generic/069:
Reservation means another node has an active reservation on that LU.
There are only two accessors of the LUN: the NFS server and
the NFS client running the test. That's why these errors are
a little surprising to me.

Either you did another previous attempt that fail and let the
reservation linger, or something else in the system claimed it.
This is the first fstests run after the systems were provisioned.
kdevops lets me provision from scratch before every run [1].

quoted
But note that generic/069 is recorded as passing without error.
When pNFS layout access fails we fall back to normal access through the
MDS, so this is expected.
Expected, OK. From a usability standpoint, error messages like
this would probably be alarming to administrators. I plan to
convert the printk's and dprintk's in the NFSD layout code into
trace points, but that doesn't help the messages emitted by the
block and SCSI drivers. Ideally this should be less noisy.

Is generic/069 that first test that failed when doing a full xfstests
run?
Yes, it's a full run. generic/069 is the first test where there
are remarkable system journal messages (ie, PR errors), though
there are a few subsequent tests that are also whinging.

Do you see LAYOUT* ops in /proc/self/mountstats for the previous
tests?
generic/013 is known to generate layout recalls, for example,
so there is layout activity during the test run.

I can go back and try reproducing with just generic/069 and
tcpdump as a first step. Is there a way I can tell that the
PR errors are not reporting a possible data corruption? I
guess the PASS report from generic/069 is one way. The pass/fail
log from xfstests for pNFS block looks just the non-pNFS runs,
so maybe this is must ado about nothing.


--
Chuck Lever

[1] - https://github.com/chucklever/kdevops/tree/pnfs-block-testing
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help