Re: Reoccurring 5 second delays during NFS calls
From: Florian Möller <hidden>
Date: 2023-02-21 16:52:25
Hi Benjamin, here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow. The command I used was touch test.txt && sleep 2 && touch test.txt test.txt did not exist previously. So you have an example of a touch without and with delay. Best regards, Florian Am 21.02.23 um 15:13 schrieb Benjamin Coddington:
On 21 Feb 2023, at 8:33, Florian Möller wrote:quoted
Hi all, unfortunately the patch did not help, neither using -o async nor using -o sync. We did some more tests (which is the reason for the delay of this reply): We used a qemu image housing both the NFS server and the client and did some kernel debugging. OS: Ubuntu 22.04.1 Kernel: 5.15.78 Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576, namlen=255,hard,proto=tcp,timeo=600,retrans=2, sec=krb5p,clientaddr=10.0.0.1, local_lock=none,addr=10.0.0.1 We touched a file and then touched the file again. This triggers the delay reliably. We set breakpoints on all functions starting with nfs4 and on "update_open_stateid". The attached file "1sttouch.log" contains a gdb log of the first touch. "2ndtouch.log" shows the gdb output of the second touch. The delay occurs in line 116 in update_open_stateid. We then deleted all breakpoints and set a sole breakpoint on update_open_stateid. We touched the file again and used only the "next" command of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8. Please let us know if you need more information or if you want us to perform further tests. Best regards, Florian MöllerHi Florian, The 5 second value and location of the delay is making me suspect something is wrong with the open stateid sequence processing. The client introduces 5-second delays in order to correctly order stateid updates from the server. Usually this happens because there are multiple processes sending OPEN/CLOSE calls and the server processess them out of order, or the client receives the responses out of order. It would be helpful to have a network capture of the problem, along with the matching output from these tracepoints on the client: nfs4:nfs4_open_stateid_update nfs4:nfs4_open_stateid_update_wait nfs4:nfs4_close_stateid_update_wait sunrpc:xs_stream_read_request sunrpc:rpc_request sunrpc:rpc_task_end And these tracepoints on the server: nfsd:nfsd_preprocess sunrpc:svc_process I'm interested in seeing how the client is processing the sequence numbers of the open stateid, or if perhaps there's a delegation in play. LMK if you need help with the tracepoints -- you can simply append those lines into /sys/kernel/debug/tracing/set_event, then reproduce the problem. The output of those tracepoints will be in /sys/kernel/debug/tracing/trace. Ben
-- Dr. Florian Möller Universität Würzburg Institut für Mathematik Emil-Fischer-Straße 30 97074 Würzburg, Germany Tel. +49 931 3185596
Attachments
- nfs-network.txt [text/plain] 2329 bytes · preview
- nfs-trace.txt [text/plain] 5009 bytes · preview