Thread (6 messages) 6 messages, 3 authors, 2022-12-12

RE: [PATCH v2] SUNRPC: serialize gss upcalls for the same uid

From: SHIMAMOTO HIROSHI (島本 裕志) <hidden>
Date: 2022-12-12 03:33:32

Subject: Re: [PATCH v2] SUNRPC: serialize gss upcalls for the same uid


quoted
On Dec 8, 2022, at 19:30, Hiroshi Shimamoto [off-list ref] wrote:

From: minoura <redacted>

Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for due
to two reasons:

- the order of the msgs in pipe->in_downcall and those in pipe->pipe
 (that is, the order of the upcalls sent to rpc.gssd) might be
 different because pipe->lock is dropped between adding one to each
 list.
- rpc.gssd uses threads to write responses, which means we cannot
 guarantee the order of responses.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
#0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc]
#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.
Wait a minute… The ‘service’ here is one of krb5, krb5i, or krb5p. What is being pushed down from user
space is a RPCSEC_GSS context that can be used for any one of those services. So the ordering of A and B
is not supposed to matter. Any one of those requests can take the context and make use of it.

However once the context has been used with one of the krb5, krb5i or krb5p services then it cannot be used
with any of the others. This is why commit 9130b8dbc6ac that you referenced above separates the services
in gss_add_msg().
One question, how about simultaneous upcalls AUTH_GSS and AUTH_UNIX?
I'm not sure there is the such case, but an error could be taken for the successful case, no? 

Thanks,
Hiroshi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help