Re: [RFC][PATCH 0/3] sched: User Managed Concurrency Groups
From: Peter Oskolkov <hidden>
Date: 2021-12-15 19:50:09
Also in:
linux-api, lkml
On Wed, Dec 15, 2021 at 10:19 AM Peter Zijlstra [off-list ref] wrote:
On Wed, Dec 15, 2021 at 09:56:06AM -0800, Peter Oskolkov wrote:quoted
quoted
Right, so the problem I'm having is that a single idle server ptr like before can trivially miss waking annother idle server.I believe the approach I used in my patchset, suggested by Thierry Delisle, works. In short, there is a single idle server ptr for the kernel to work with. The userspace maintains a list of idle servers. If the ptr is NULL, the list is empty. When the kernel wakes the idle server it sees, the server reaps the runnable worker list and wakes another idle server from the userspace list, if available. This newly woken idle server repoints the ptr to itself, checks the runnable worker list, to avoid missing a woken worker, then goes to sleep. Why do you think this approach is not OK?Suppose at least 4 servers, 2 idle, 2 working. Now, the first of the working servers (lets call it S0) gets a wakeup (say Ta), it finds the idle server (say S3) and consumes it, sticking Ta on S3 and kicking it alive.
TL;DR: our models are different here. In your model a single server can have a bunch of workers interacting with it; in my model only a single RUNNING worker is assigned to a server, which it wakes when it blocks. More details: "Working servers" cannot get wakeups, because a "working server" has a single RUNNING worker attached to it. When a worker blocks, it wakes its attached server and becomes a detached blocked worker (same is true if the worker is "preempted": it blocks and wakes its assigned server). Blocked workers upon wakeup do this, in order: - always add themselves to the runnable worker list (the list is shared among ALL servers, it is NOT per server); - wake a server pointed to by idle_server_ptr, if not NULL; - sleep, waiting for a wakeup from a server; Server S, upon becoming IDLE (no worker to run, or woken on idle server list) does this, in order, in userspace (simplified, see umcg_get_idle_worker() in https://lore.kernel.org/lkml/20211122211327.5931-5-posk@google.com/ (local)): - take a userspace (spin) lock (so the steps below are all within a single critical section): - compare_xchg(idle_server_ptr, NULL, S); - if failed, there is another server in idle_server_ptr, so S adds itself to the userspace idle server list, releases the lock, goes to sleep; - if succeeded: - check the runnable worker list; - if empty, release the lock, sleep; - if not empty: - get the list - xchg(idle_server_ptr, NULL) (either S removes itself, or a worker in the kernel does it first, does not matter); - release the lock; - wake server S1 on idle server list. S1 goes through all of these steps. The protocol above serializes the userspace dealing with the idle server ptr/list. Wakeups in the kernel will be caught if there are idle servers. Yes, the protocol in the userspace is complicated (more complicated than outlined above, as the reaped idle/runnable worker list from the kernel is added to the userspace idle/runnable worker list), but the kernel side is very simple. I've tested this interaction extensively, I'm reasonably sure that no worker wakeups are lost.
Concurrently and loosing the race the other working server (S1) gets a wake-up from Tb, like said, it lost, no idle server, so Tb goes on the queue of S1. So then S3 wakes, finds Ta and they live happily ever after. While S2 and Tb fail to meet one another and both linger in sadness.