Thread (6 messages) 6 messages, 3 authors, 2023-02-21

Re: Reoccurring 5 second delays during NFS calls

From: Florian Möller <hidden>
Date: 2023-02-21 16:52:25

Hi Benjamin,

here are the trace and a listing of the corresponding network packages. If the 
listing is not detailed enough, I can send you a full package dump tomorrow.

The command I used was

touch test.txt && sleep 2 && touch test.txt

test.txt did not exist previously. So you have an example of a touch without and 
with delay.

Best regards,
Florian

Am 21.02.23 um 15:13 schrieb Benjamin Coddington:
On 21 Feb 2023, at 8:33, Florian Möller wrote:
quoted
Hi all,

unfortunately the patch did not help, neither using -o async nor using -o sync. We did some more tests (which is the reason for the delay of this reply):

We used a qemu image housing both the NFS server and the client and did some kernel debugging.
OS: Ubuntu 22.04.1
Kernel: 5.15.78
Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,
	namlen=255,hard,proto=tcp,timeo=600,retrans=2,
	sec=krb5p,clientaddr=10.0.0.1,
	local_lock=none,addr=10.0.0.1

We touched a file and then touched the file again. This triggers the delay reliably.

We set breakpoints on all functions starting with nfs4 and on "update_open_stateid". The attached file "1sttouch.log" contains a gdb log of the first touch.
"2ndtouch.log" shows the gdb output of the second touch. The delay occurs in line 116 in update_open_stateid.

We then deleted all breakpoints and set a sole breakpoint on update_open_stateid. We touched the file again and used only the "next" command of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8.

Please let us know if you need more information or if you want us to perform further tests.

Best regards,
Florian Möller
Hi Florian,

The 5 second value and location of the delay is making me suspect something
is wrong with the open stateid sequence processing.

The client introduces 5-second delays in order to correctly order stateid
updates from the server.  Usually this happens because there are multiple
processes sending OPEN/CLOSE calls and the server processess them out of
order, or the client receives the responses out of order.

It would be helpful to have a network capture of the problem, along with the
matching output from these tracepoints on the client:

nfs4:nfs4_open_stateid_update
nfs4:nfs4_open_stateid_update_wait
nfs4:nfs4_close_stateid_update_wait
sunrpc:xs_stream_read_request
sunrpc:rpc_request
sunrpc:rpc_task_end

And these tracepoints on the server:

nfsd:nfsd_preprocess
sunrpc:svc_process

I'm interested in seeing how the client is processing the sequence numbers
of the open stateid, or if perhaps there's a delegation in play.

LMK if you need help with the tracepoints -- you can simply append those
lines into /sys/kernel/debug/tracing/set_event, then reproduce the problem.
The output of those tracepoints will be in /sys/kernel/debug/tracing/trace.

Ben
-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help