Re: NFSv4.1/4.2 server returns same sessionid after DestroySession/CreateSession
From: Rick Macklem <hidden>
Date: 2022-07-11 20:43:39
From: Chuck Lever III <redacted> wrote:
quoted
On Jul 11, 2022, at 1:33 PM, Chuck Lever III [off-list ref] wrote:quoted
On Jul 11, 2022, at 11:01 AM, Chuck Lever III [off-list ref] wrote:quoted
On Jul 10, 2022, at 6:10 PM, Rick Macklem [off-list ref] wrote: Hi, I have been trying to improve the behaviour of the FreeBSD NFSv4.1/4.2 client when using the "intr" mount option. I have come up with the following scheme: - When RPCs are interrupted, mark the session slot as potentially bad. - When all session slots are marked potentially bad, do a DestroySession (only op in RPC) to destroy the session. - When the server replies NFS4ERR_BAD_SESSION, do a CreateSession (only op in RPC) to acquire a new session and continue on. When testing against a Linux 5.15 server, the CreateSession succeeds, but returns the same sessionid as the old session. Then all subsequent RPCs get the NFS4ERR_BAD_SESSION reply. (The client repeatedly does CreateSession RPCs that reply NFS_OK, but always with the same sessionid as the destroyed one.) Here's what I see in the packet trace: (everything works normally until all session slots are marked potentially bad at packet# 14216) packet# RPC 14216 DestroySession request for sessionid 2725cb62002ed418040...0 14302 DestroySession reply NFS_OK 14304 Getattr request (using above sessionid) 14305 Getattr reply NFS4ERR_BAD_SESSION 14306 CreateSession request *** Now here is where I see a problem... 14307 CreateSession reply NFS_OK with sessionid 2725cb62002ed418040...0 (same as above) 14308 Getattr request (using above sessionid) 14309 Getattr reply NFS4ERR_BAD_SESSION - and then this just repeats... The whole packet trace can be found here, in case you are interested: https://people.freebsd.org/~rmacklem/linux.pcap It seems to me that a successful CreateSession should always return a new unique sessionid?Hi Rick, thanks for the bug report. CREATE_SESSION has a built-in reply cache to thwart replay attacks. It can legitimately return the same sessionid as a previous request. Granted, DESTROY_SESSION is supposed to wipe that reply cache...
Well, I just re-read the RFC and I don't see anything that says DestroySession affects the CreateSession reply cache. I had thought it did, but I was wrong. See below...
quoted
quoted
I'd like to see if there's a test in pynfs that replicates or is close to the series of operations in your trace so that I can reproduce on my lab systems and watch it fail up close.I constructed a pynfs test that does something similar to your reproducer:diff --git a/nfs4.1/server41tests/st_destroy_session.py b/nfs4.1/server41tests/st_destroy_session.py index b8be62582366..014330e7d623 100644 --- a/nfs4.1/server41tests/st_destroy_session.py +++ b/nfs4.1/server41tests/st_destroy_session.py@@ -1,12 +1,33 @@from .st_create_session import create_session from xdrdef.nfs4_const import * -from .environment import check, fail, create_file, open_file +from .environment import check, fail, create_file, open_file, close_file from xdrdef.nfs4_type import open_owner4, openflag4, createhow4, open_claim4 import nfs_ops op = nfs_ops.NFS4ops() import threading import rpc.rpc as rpc +def testDestroyBasic(t, env): + """Ensure operations outside a session fail with BADSESSION + + FLAGS: destroy_session all + CODE: DSESS1 + """ + c = env.c1.new_client(env.testname(t)) + sess1 = c.create_session() + sess1.compound([op.reclaim_complete(FALSE)]) + res = c.c.compound([op.destroy_session(sess1.sessionid)]) + res = create_file(sess1, env.testname(t), + access=OPEN4_SHARE_ACCESS_READ) + check(res, NFS4ERR_BADSESSION) + sess2 = c.create_session() + res = create_file(sess2, env.testname(t), + access=OPEN4_SHARE_ACCESS_READ) + check(res) + fh = res.resarray[-1].object + open_stateid = res.resarray[-2].stateid + close_file(sess2, fh, stateid=open_stateid) + def testDestroy(t, env): """ - create a session I'm not able to reproduce the problem on 5.19-rc5, but that probably means there's something going on that we haven't discovered yet.
Nope, your guess below was correct. It is a FreeBSD client bug.
My guess is that your client is sending CREATE_SESSION operations with the same sequence ID (1) and that is hitting in NFSD's CREATE_SESSION reply cache. So it's treating the client's new requests as replays and returning an old (stale) sessionid.
Yep. For some reason, I thought that a DestroySeseesion
put the sequence# back.
On re-reading the RFC, there is nothing mentioning that.
Now that I've patched the client to increment the sequence#,
it works fine against the Linux server.
Sorry about the noise (hopefully your pynfs test is still useful).
Thanks for the help, rick
ps: Now I have to fix a deficiency in the FreeBSD server. Right now,
it only checks the sequence# for an unconfirmed ClientID, but
that is obviously a FreeBSD bug, too.
--
Chuck Lever