Thread (3 messages) 3 messages, 2 authors, 2024-06-17

Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice

From: Chris Friesen <hidden>
Date: 2024-06-13 22:44:29

Hi all,

I'm not subscribed to the list so please CC me on replies.  I'm seeing 
some odd behaviour and I'm not sure what's going on.

The short version is that I can mount NFS exports from this server 
(including mounting on the server itself) via TCP but not via UDP.

rpcinfo on the server gives the following:

root@controller-0:/var/home/sysadmin# rpcinfo -s
    program version(s) netid(s)                     service     owner
     100000  2,3,4     local,udp,tcp,udp6,tcp6      portmapper  superuser
     100024  1         tcp6,udp6,tcp,udp            status      116
     100003  4,3       udp6,tcp6,udp,tcp            nfs         superuser
     100227  3         udp6,tcp6,udp,tcp            -           superuser
     100021  4,3,1     tcp6,udp6,tcp,udp            nlockmgr    superuser
     100005  3,2,1     tcp6,udp6,tcp,udp            mountd      superuser

root@controller-0:/var/home/sysadmin# rpcinfo -p
    program vers proto   port  service
     100000    4   tcp    111  portmapper
     100000    3   tcp    111  portmapper
     100000    2   tcp    111  portmapper
     100000    4   udp    111  portmapper
     100000    3   udp    111  portmapper
     100000    2   udp    111  portmapper
     100024    1   udp  44993  status
     100024    1   tcp  60561  status
     100003    3   tcp   2049  nfs
     100003    4   tcp   2049  nfs
     100227    3   tcp   2049
     100003    3   udp   2049  nfs
     100227    3   udp   2049
     100021    1   udp  36993  nlockmgr
     100021    3   udp  36993  nlockmgr
     100021    4   udp  36993  nlockmgr
     100021    1   tcp  33239  nlockmgr
     100021    3   tcp  33239  nlockmgr
     100021    4   tcp  33239  nlockmgr
     100005    1   udp  43636  mountd
     100005    1   tcp  49357  mountd
     100005    2   udp  39783  mountd
     100005    2   tcp  54527  mountd
     100005    3   udp  40970  mountd
     100005    3   tcp  41761  mountd


So it *looks* okay to me, but if I probe it I get the expected response 
for TCP but not for UDP:

root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
program 100003 version 4 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
program 100003 version 3 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
rpcinfo: RPC: Timed out
program 100003 version 3 is not available


Oddly the last command returns immediately so it's not actually timing 
out, the error message is a bit misleading.   If I sniff the network 
traffic while running that last command, I see this:

root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], 
proto UDP (17), length 68)
     localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], 
proto UDP (17), length 28)
     localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] 
UDP, length 0


Ignoring the checksums, I get this:

root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], 
proto UDP (17), length 68)
     localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], 
proto UDP (17), length 28)
     localhost.nfs > localhost.934: UDP, length 0


On a separate system with a 5.10 kernel I get a successful response that 
is longer and is properly decoded:

root@controller-0:/var/home/sysadmin#  tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto 
UDP (17), length 68)
     localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto 
UDP (17), length 52)
     localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 
null


Anyone have any ideas what might be going on or how to debug?  I'm 
building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives 
anything useful.

Thanks,
Chris
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help