Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
From: Anirudh Rayabharam <hidden>
Date: 2022-02-21 19:36:45
Also in:
kvm, lkml
On Mon, Feb 21, 2022 at 07:26:28PM +0100, Stefano Garzarella wrote:
On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote:quoted
On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote:quoted
On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote:quoted
On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote:quoted
On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella [off-list ref] wrote:quoted
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device ownership. It expects current->mm to be valid. vhost_vsock_stop() is also called by vhost_vsock_dev_release() when the user has not done close(), so when we are in do_exit(). In this case current->mm is invalid and we're releasing the device, so we should clean it anyway. Let's check the owner only when vhost_vsock_stop() is called by an ioctl. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> --- drivers/vhost/vsock.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)Reported-and-tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.comI don't think this patch fixes "INFO: task hung in vhost_work_dev_flush" even though syzbot says so. I am able to reproduce the issue locally even with this patch applied.Are you using the sysbot reproducer or another test? In that case, can you share it?I am using the syzbot reproducer.quoted
From the stack trace it seemed to me that the worker accesses a zone that has been cleaned (iotlb), so it is invalid and fails.Would the thread hang in that case? How?Looking at this log [1] it seems that the process is blocked on the wait_for_completion() in vhost_work_dev_flush(). Since we're not setting the backend to NULL to stop the worker, it's likely that the worker will keep running, preventing the flush work from completing.
The log shows that the worker thread is stuck in iotlb_access_ok(). How
will setting the backend to NULL stop it? During my debugging I found
that the worker is stuck in this while loop:
1361 while (len > s) {
1362 map = vhost_iotlb_itree_first(umem, addr, last);
1363 if (map == NULL || map->start > addr) {
1364 vhost_iotlb_miss(vq, addr, access);
1365 return false;
1366 } else if (!(map->perm & access)) {
1367 /* Report the possible access violation by
1368 * request another translation from userspace.
1369 */
1370 return false;
1371 }
1372
1373 pr_info("iotlb_access_ok: after msize=%llu, mstart=%llu\n",
1374 map->size, map->start);
1375 size = map->size - addr + map->start;
1376
1377 if (orig_addr == addr && size >= len)
1378 vhost_vq_meta_update(vq, map, type);
1379
1380 s += size;
1381 addr += size;
1382 }
[1] https://syzkaller.appspot.com/text?tag=CrashLog&x=153f0852700000