Re: [PATCH net-next 3/3] net: WQ_PERCPU added to alloc_workqueue users
From: Bobby Eshleman <hidden>
Date: 2025-09-08 16:44:12
Also in:
lkml
On Fri, Sep 05, 2025 at 11:05:05AM +0200, Marco Crivellari wrote:
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag at the network subsystem, to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <redacted>
[...]
quoted hunk ↗ jump to hunk
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f0e48e6911fc..b3e960108e6b 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c@@ -916,7 +916,7 @@ static int __init virtio_vsock_init(void) { int ret; - virtio_vsock_workqueue = alloc_workqueue("virtio_vsock", 0, 0); + virtio_vsock_workqueue = alloc_workqueue("virtio_vsock", WQ_PERCPU, 0); if (!virtio_vsock_workqueue) return -ENOMEM;diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 6e78927a598e..bc2ff918b315 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c@@ -139,7 +139,7 @@ static int __init vsock_loopback_init(void) struct vsock_loopback *vsock = &the_vsock_loopback; int ret; - vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0); + vsock->workqueue = alloc_workqueue("vsock-loopback", WQ_PERCPU, 0); if (!vsock->workqueue) return -ENOMEM;
LGTM for the vmw_vsock bits. Regarding step 2 "Check who really needs to be per-cpu", IIRC a few years ago I did some playing around with per-cpu wq for vsock and I don't think I saw a huge difference in performance, so I'd expect it to be in the "not really needs per-cpu" camp... I might be able to help re-evaluate that when the time comes. Reviewed-by: Bobby Eshleman <redacted>