RE: Trying to reduce NFSv4 timeouts to a few seconds on an established connection

From: Andrew Klaassen <hidden>
Date: 2023-01-26 15:31:39

From: Andrew Klaassen <redacted>
Sent: Monday, January 23, 2023 11:31 AM

Hello,

There's a specific NFSv4 mount on a specific machine which we'd like to
timeout and return an error after a few seconds if the server goes away.

I've confirmed the following on two different kernels, 4.18.0-
348.12.2.el8_5.x86_64 and 6.1.7-200.fc37.x86_64.

I've been able to get both autofs and the mount command to cooperate, so
that the mount attempt fails after an arbitrary number of seconds.  This
mount command, for example, will fail after 6 seconds, as expected based on
the timeo=20,retrans=2,retry=0 options:

$ time sudo mount -t nfs4 -o
rw,relatime,sync,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin
=0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=20,retran
s=2,retry=0,sec=sys thor04:/mnt/thorfs04  /mnt/thor04
mount.nfs4: Connection timed out

real    0m6.084s
user    0m0.007s
sys     0m0.015s

However, if the share is already mounted and the server goes away, the
timeout is always 2 minutes plus the time I expect based on timeo and
retrans.  In this case, 2 minutes and 6 seconds:

$ time ls /mnt/thor04
ls: cannot access '/mnt/thor04': Connection timed out

real    2m6.025s
user    0m0.003s
sys     0m0.000s

Watching the outgoing packets in the second case, the pattern is always the
same:
 - 0.2 seconds between the first two, then doubling each time until the two
minute mark is exceeded (so the last NFS packet, which is always the 11th
packet, is sent around 1:45 after the first).
 - Then some generic packets that start exactly-ish on the two minute mark, 1
second between the first two, then doubling each time.  (By this time the
NFS command has given up.)

11:10:21.898305 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889483 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:22.105189 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889690 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:22.313290 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889898 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:22.721269 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834890306 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:23.569192 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834891154 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:25.233212 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834892818 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:28.497282 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834896082 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:35.025219 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834902610 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:10:48.337201 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834915922 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:11:14.449303 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834942034 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:12:08.721251 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
14452:14652, ack 18561, win 501, options [nop,nop,TS val 834996306 ecr
1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
11:12:22.545394 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835010130 ecr 0,nop,wscale 7],
length 0
11:12:23.570199 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835011155 ecr 0,nop,wscale 7],
length 0
11:12:25.617284 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835013202 ecr 0,nop,wscale 7],
length 0
11:12:29.649219 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835017234 ecr 0,nop,wscale 7],
length 0
11:12:37.905274 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835025490 ecr 0,nop,wscale 7],
length 0
11:12:54.289212 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835041874 ecr 0,nop,wscale 7],
length 0
11:13:26.545304 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
win 64240, options [mss 1460,sackOK,TS val 835074130 ecr 0,nop,wscale 7],
length 0

I tried changing tcp_retries2 as suggested in another thread from this list:

# echo 3 > /proc/sys/net/ipv4/tcp_retries2

...but it made no difference on either kernel.  The 2 minute timeout also
doesn't seem to match with what I'd calculate from the initial value of
tcp_retries2, which should give a much higher timeout.

The only clue I've been able to find is in the retry=n entry in the NFS
manpage:

" For TCP the default is 3 minutes, but system TCP connection timeouts will
sometimes limit the timeout of each retransmission to around 2 minutes."

What I'm not able to make sense of:
 - The retry option says that it applies to mount operations, not read/write
operations.  However, in this case I'm seeing the 2 minute delay on
read/write operations but *not* mount operations.
 - A couple of hours of searching didn't lead me to any kernel settings that
would result in a 2 minute timeout.

Does anyone have any clues about a) what's happening and b) how to get
our desired behaviour of being able to control both mount and read/write
timeouts down to a few seconds?

Thanks.

I thought that changing TCP_RTO_MAX in include/net/tcp.h from 120 to something smaller and recompiling the kernel would change the 2 minute timeout, but it had no effect.  I'm going to keep poking through the kernel code to see if there's a knob I can turn to change the 2 minute timeout, so that I can at least understand where it's coming from.

Any hints as to where I should be looking?

Andrew

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help