Thread (40 messages) 40 messages, 6 authors, 2022-01-20

Re: [RFC][PATCH 0/3] sched: User Managed Concurrency Groups

From: Peter Oskolkov <hidden>
Date: 2021-12-15 19:50:09
Also in: linux-api, lkml

On Wed, Dec 15, 2021 at 10:19 AM Peter Zijlstra [off-list ref] wrote:
On Wed, Dec 15, 2021 at 09:56:06AM -0800, Peter Oskolkov wrote:
quoted
quoted
Right, so the problem I'm having is that a single idle server ptr like
before can trivially miss waking annother idle server.
I believe the approach I used in my patchset, suggested by Thierry
Delisle, works.

In short, there is a single idle server ptr for the kernel to work
with. The userspace maintains a list of idle servers. If the ptr is
NULL, the list is empty. When the kernel wakes the idle server it
sees, the server reaps the runnable worker list and wakes another idle
server from the userspace list, if available. This newly woken idle
server repoints the ptr to itself, checks the runnable worker list, to
avoid missing a woken worker, then goes to sleep.

Why do you think this approach is not OK?
Suppose at least 4 servers, 2 idle, 2 working.

Now, the first of the working servers (lets call it S0) gets a wakeup
(say Ta), it finds the idle server (say S3) and consumes it, sticking Ta
on S3 and kicking it alive.
TL;DR: our models are different here. In your model a single server
can have a bunch of workers interacting with it; in my model only a
single RUNNING worker is assigned to a server, which it wakes when it
blocks.

More details:

"Working servers" cannot get wakeups, because a "working server" has a
single RUNNING worker attached to it. When a worker blocks, it wakes
its attached server and becomes a detached blocked worker (same is
true if the worker is "preempted": it blocks and wakes its assigned
server).

Blocked workers upon wakeup do this, in order:

- always add themselves to the runnable worker list (the list is
shared among ALL servers, it is NOT per server);
- wake a server pointed to by idle_server_ptr, if not NULL;
- sleep, waiting for a wakeup from a server;

Server S, upon becoming IDLE (no worker to run, or woken on idle
server list) does this, in order, in userspace (simplified, see
umcg_get_idle_worker() in
https://lore.kernel.org/lkml/20211122211327.5931-5-posk@google.com/ (local)):
- take a userspace (spin) lock (so the steps below are all within a
single critical section):
- compare_xchg(idle_server_ptr, NULL, S);
  - if failed, there is another server in idle_server_ptr, so S adds
itself to the userspace idle server list, releases the lock, goes to
sleep;
  - if succeeded:
    - check the runnable worker list;
        - if empty, release the lock, sleep;
        - if not empty:
           - get the list
           - xchg(idle_server_ptr, NULL) (either S removes itself, or
a worker in the kernel does it first, does not matter);
           - release the lock;
           - wake server S1 on idle server list. S1 goes through all
of these steps.

The protocol above serializes the userspace dealing with the idle
server ptr/list. Wakeups in the kernel will be caught if there are
idle servers. Yes, the protocol in the userspace is complicated (more
complicated than outlined above, as the reaped idle/runnable worker
list from the kernel is added to the userspace idle/runnable worker
list), but the kernel side is very simple. I've tested this
interaction extensively, I'm reasonably sure that no worker wakeups
are lost.
Concurrently and loosing the race the other working server (S1) gets a
wake-up from Tb, like said, it lost, no idle server, so Tb goes on the
queue of S1.

So then S3 wakes, finds Ta and they live happily ever after.

While S2 and Tb fail to meet one another and both linger in sadness.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help