Re: [syzbot] INFO: task hung in vhost_worker_killed (2)
From: Mike Christie <michael.christie@oracle.com>
Date: 2026-01-07 18:07:41
Also in:
kvm, lkml, virtualization
On 1/6/26 1:57 AM, Michael S. Tsirkin wrote:
On Tue, Jan 06, 2026 at 09:46:30AM +0800, Hillf Danton wrote:quoted
quoted
taking vq mutex in a kill handler is probably not wise. we should have a separate lock just for handling worker assignment.Better not before showing us the root cause of the hung to avoid adding a blind lock.Well I think it's pretty clear but the issue is that just another lock is not enough, we have bigger problems with this mutex. It's held around userspace accesses so if the vhost thread gets into uninterruptible sleep holding that, a userspace thread trying to take it with mutex_lock will be uninterruptible. So it propagates the uninterruptible status between vhost and a userspace thread. It's not a new issue but the new(ish) thread management APIs make it more visible. Here it's the kill handler that got hung but it's not really limited to that, any ioctl can do that, and I do not want to add another lock on data path.
Above, are you saying that the kill handler and a ioctl are trying
to take the virtqueue->mutex in this bug?
I've been trying to replicate this for a while, but I just can't hit what
I'm seeing in the lockdep info from the initial email. We only see the
kill handler trying to take the virtqueue->mutex. Is the theory that the
locking info being reported is not correct? A userspace thread is doing
an ioctl that took the mutex but it's not reported below?
Originally I was using the vhost_dev->mutex for the locking in vhost_worker_killed
but I saw we could take that during ioctls which did a flush, so I added the
vhost_worker->mutex for some of the locking.
If the virtqueue->mutex is also an issue I can do a patch.
Showing all locks held in the system:
1 lock held by khungtaskd/32:
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
2 locks held by getty/5579:
#0: ffff88814e3cb0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc9000332b2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
1 lock held by syz-executor/5978:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:311 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x2b1/0x6e0 kernel/rcu/tree_exp.h:956
2 locks held by syz.5.259/7601:
3 locks held by vhost-7617/7618:
#0: ffff888054cc68e8 (&vtsk->exit_mutex){+.+.}-{4:4}, at: vhost_task_fn+0x322/0x430 kernel/vhost_task.c:54
#1: ffff888024646a80 (&worker->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x57/0x390 drivers/vhost/vhost.c:470
#2: ffff8880550c0258 (&vq->mutex){+.+.}-{4:4}, at: vhost_worker_killed+0x12b/0x390 drivers/vhost/vhost.c:476
1 lock held by syz-executor/7850:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock kernel/rcu/tree_exp.h:343 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: synchronize_rcu_expedited+0x36e/0x6e0 kernel/rcu/tree_exp.h:956
1 lock held by syz.2.640/9940:
4 locks held by syz.3.641/9946:
3 locks held by syz.1.642/9954: