Thread (9 messages) 9 messages, 4 authors, 2024-10-11

Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit

From: Jeff Layton <jlayton@kernel.org>
Date: 2024-09-04 14:36:29
Also in: linux-nfs, lkml

On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
quoted
On Sun, 01 Sep 2024, syzbot wrote:
quoted
syzbot has found a reproducer for the following issue on:
I had a poke around using the provided disk image and kernel for
exploring.

I think the problem is demonstrated by this stack :

[<0>] rpc_wait_bit_killable+0x1b/0x160
[<0>] __rpc_execute+0x723/0x1460
[<0>] rpc_execute+0x1ec/0x3f0
[<0>] rpc_run_task+0x562/0x6c0
[<0>] rpc_call_sync+0x197/0x2e0
[<0>] rpcb_register+0x36b/0x670
[<0>] svc_unregister+0x208/0x730
[<0>] svc_bind+0x1bb/0x1e0
[<0>] nfsd_create_serv+0x3f0/0x760
[<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
[<0>] genl_rcv_msg+0xb16/0xec0
[<0>] netlink_rcv_skb+0x1e5/0x430

No rpcbind is running on this host so that "svc_unregister" takes a
long time.  Maybe not forever but if a few of these get queued up all
blocking some other thread, then maybe that pushed it over the limit.

The fact that rpcbind is not running might not be relevant as the test
messes up the network.  "ping 127.0.0.1" stops working.

So this bug comes down to "we try to contact rpcbind while holding a
mutex and if that gets no response and no error, then we can hold the
mutex for a long time".

Are we surprised? Do we want to fix this?  Any suggestions how?
In the past, we've tried to address "hanging upcall" issues where
the kernel part of an administrative command needs a user space
service that isn't working or present. (eg mount needing a running
gssd)

If NFSD is using the kernel RPC client for the upcall, then maybe
adding the RPC_TASK_SOFTCONN flag might turn the hang into an
immediate failure.

IMO this should be addressed.

Looking at rpcb_register_call, it looks like we already set SOFTCONN if
is_set is true. We probably did that assuming that we only call
svc_unregister on shutdown. svc_rpcb_setup does this though:

        /* Remove any stale portmap registrations */
        svc_unregister(serv, net);
        return 0;

What would be the risk in just setting SOFTCONN unconditionally?
-- 
Jeff Layton [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help