RE: [PATCH v2] SUNRPC: serialize gss upcalls for the same uid
From: SHIMAMOTO HIROSHI (島本 裕志) <hidden>
Date: 2022-12-12 03:33:32
Subject: Re: [PATCH v2] SUNRPC: serialize gss upcalls for the same uidquoted
On Dec 8, 2022, at 19:30, Hiroshi Shimamoto [off-list ref] wrote: From: minoura <redacted> Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined. When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for due to two reasons: - the order of the msgs in pipe->in_downcall and those in pipe->pipe (that is, the order of the upcalls sent to rpc.gssd) might be different because pipe->lock is dropped between adding one to each list. - rpc.gssd uses threads to write responses, which means we cannot guarantee the order of responses. We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7. PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up.Wait a minute… The ‘service’ here is one of krb5, krb5i, or krb5p. What is being pushed down from user space is a RPCSEC_GSS context that can be used for any one of those services. So the ordering of A and B is not supposed to matter. Any one of those requests can take the context and make use of it. However once the context has been used with one of the krb5, krb5i or krb5p services then it cannot be used with any of the others. This is why commit 9130b8dbc6ac that you referenced above separates the services in gss_add_msg().
One question, how about simultaneous upcalls AUTH_GSS and AUTH_UNIX? I'm not sure there is the such case, but an error could be taken for the successful case, no? Thanks, Hiroshi