Thread (2 messages) 2 messages, 2 authors, 2021-08-10

Re: Spurious instability with NFSoRDMA under moderate load

From: Timo Rothenpieler <hidden>
Date: 2021-08-10 21:40:51
Also in: linux-nfs

On 10.08.2021 19:17, Chuck Lever III wrote:
What I see in this data is that the server is reporting

    SEQ4_STATUS_CB_PATH_DOWN

and the client is attempting to recover (repeatedly) using
BIND_CONN_TO_SESSION. But apparently the recovery didn't
actually work, because the server continues to report a
callback path problem.

[1712389.125641] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001
[1712389.129264] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200!

[1712389.171953] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001
[1712389.178361] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200!

[1712389.195606] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001
[1712389.203891] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200!

I guess it's time to switch to tracing on the server side
to see if you can nail down why the server's callback
requests are failing. On your NFS server, run:

  # trace-cmd record -e nfsd -e sunrpc -e rpcgss -e rpcrdma

at roughly the same point during your test that you captured
the previous client-side trace.
I wonder if reverting 6820bf77864d on the server, to have an easier way 
to reproduce this state, would be worth it.

Cause it seems like the actual underlying issue is the inability of the 
NFS server (and/or client) to reestablish the backchannel if it gets 
disconnected for whatever reason?

Right now I already rebooted the client, and everything is working 
again, so I'll have to wait a potentially long time for this to happen 
again otherwise.

Attachments

  • smime.p7s [application/pkcs7-signature] 4494 bytes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help