Thread (8 messages) 8 messages, 4 authors, 2023-02-23

Re: Reoccurring 5 second delays during NFS calls

From: Florian Möller <hidden>
Date: 2023-02-22 12:48:18

Am 22.02.23 um 13:22 schrieb Jeff Layton:
On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
quoted
On 22 Feb 2023, at 3:19, Florian Möller wrote:
quoted
Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
quoted
On 21 Feb 2023, at 11:52, Florian Möller wrote:
quoted
Hi Benjamin,

here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.

The command I used was

touch test.txt && sleep 2 && touch test.txt

test.txt did not exist previously. So you have an example of a touch without and with delay.
Thanks!  These are great - I can see from them that the client is indeed
waiting in the stateid update mechanism because the server has returned
NFS4ERR_STALE to the client's first CLOSE.

That is unusual.  The server is signaling that the open file's stateid is old,
so I am interested to see if the first CLOSE is sent with the stateid's
sequence that was returned from the server.  I could probably see this if I
had the server-side tracepoint data.
Hi Benjamin,

the server-side tracepoints

nfsd:nfsd_preprocess
sunrpc:svc_process

were enabled. It seems they did not produce any output.

What I did now was:
- enable all nfsd tracepoints,
- enable all nfs4 tracepoints,
- enable all sunrpc tracepoints.

The command I used was

touch somefile && sleep 2 && touch somefile.

Then I unmounted the NFS share - this also causes a delay.

I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.

In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
see:

nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
This just means that the kernel called into the "cache" infrastructure
to find an export entry, and there wasn't one.


Looking back at the original email here, I'd say this is expected since
the export wasn't set up for krb5i.

Output of exportfs -v:
/export
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
I changed the export definitions to

/export 
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

such that all three kerberos security types are exported. With this setup the 
delay is gone. Without the krb5i export the delay occurs again.
Mounting the NFS share with different kerberos security types does not change 
this behaviour - without the krb5i export there is a delay.

Best,
Florian


quoted
.. so nfsd's exp_find_key() is having trouble and returns -ENOENT.  Does
this look familiar to anyone?

I am not as familiar with how the server operates here, so my next step
would be to start inserting trace_printk's into the kernel source to figure
out what's going wrong in there.  However, we can also use the function
graph tracer to see where the kernel is going.  That would look like:

  echo exp_find_key > /sys/kernel/tracing/set_graph_function
  echo 7 > /sys/kernel/debug/tracing/max_graph_depth
  echo function_graph > /sys/kernel/debug/tracing/current_tracer
quoted
/sys/kernel/debug/tracing/trace
  .. reproduce

  cat /sys/kernel/debut/tracing/trace

Hopefully someone with more knfsd/sunrpc experience recognizes this.. but it
makes a lot more sense now that krb5 is part of the problem.

Ben
-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help