Re: Reoccurring 5 second delays during NFS calls
From: Benjamin Coddington <hidden>
Date: 2023-02-22 12:46:56
On 22 Feb 2023, at 7:22, Jeff Layton wrote:
On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:quoted
On 22 Feb 2023, at 3:19, Florian Möller wrote:quoted
Am 21.02.23 um 19:58 schrieb Benjamin Coddington:quoted
On 21 Feb 2023, at 11:52, Florian Möller wrote:quoted
Hi Benjamin, here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow. The command I used was touch test.txt && sleep 2 && touch test.txt test.txt did not exist previously. So you have an example of a touch without and with delay.Thanks! These are great - I can see from them that the client is indeed waiting in the stateid update mechanism because the server has returned NFS4ERR_STALE to the client's first CLOSE. That is unusual. The server is signaling that the open file's stateid is old, so I am interested to see if the first CLOSE is sent with the stateid's sequence that was returned from the server. I could probably see this if I had the server-side tracepoint data.Hi Benjamin, the server-side tracepoints nfsd:nfsd_preprocess sunrpc:svc_process were enabled. It seems they did not produce any output. What I did now was: - enable all nfsd tracepoints, - enable all nfs4 tracepoints, - enable all sunrpc tracepoints. The command I used was touch somefile && sleep 2 && touch somefile. Then I unmounted the NFS share - this also causes a delay. I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with NFS4ERR_STALE, so nothing to do with the open stateid sequencing. I also see: nfsd-1693 [000] ..... 1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2 nfsd-1693 [000] ..... 1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2 nfsd-1693 [000] ..... 1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70This just means that the kernel called into the "cache" infrastructure to find an export entry, and there wasn't one. Looking back at the original email here, I'd say this is expected since the export wasn't set up for krb5i. Output of exportfs -v: /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
Jeff and I were chatting on #linux-nfs, and its back to looking like a client issue.. as he's pointed out that the client starts out with krb5, then in Frame 8 (the CLOSE that receives STALE) switches to krb5i, then back to krb5 in Frame 10.. .. this should be reproduceable. Let me dust off my old kdc. Ben