Thread (40 messages) 40 messages, 6 authors, 2022-01-20

Re: [RFC][PATCH 3/3] sched: User Mode Concurency Groups

From: Peter Oskolkov <hidden>
Date: 2022-01-19 17:52:53
Also in: linux-mm, lkml

On Wed, Jan 19, 2022 at 1:00 AM Peter Zijlstra [off-list ref] wrote:
On Tue, Jan 18, 2022 at 10:19:21AM -0800, Peter Oskolkov wrote:
quoted
=========== signals and the general approach

My version of the patchset has all of these things working. What it
does not have,
compared to the new approach we are discussing here, is runqueues per server
and proper signal handling (and potential integration with proxy execution).

Runqueues per server, in the LAZY mode, are easy to emulate in my patchset:
nothing prevents the userspace to partition workers among servers, and have
servers that "own" their workers to be pointed at by idle_server_tid_ptr.

The only thing that is missing is proper treating of signals. But my patchset
does ensure a single running worker per server, had pagefaults and preemptions
sorted out, etc. Basically, everything works except signals. This patchet
has issues with pagefaults,
Already fixed pagefaults per:

  YeGvovSckivQnKX8@hirez.programming.kicks-ass.net
Could you, please, post an updated RFC when you have a chance? Thanks!
quoted
worker timeouts
I still have no clear answer as to what you actually want there.
quoted
, worker-to-worker context
switches (do workers move runqueues when they context switch?), etc.
Not in kernel, if they need to be migrated, userspace needs to do that.
quoted
And my patchset now actually looks smaller and simpler, on the kernel side,
that what this patchset is shaping up to be.

What if I fix signals in my patchset? I think the way you deal with signals
will work in my approach equally well; I'll also use umcg_kick() to preempt
workers instead of sending them a signal.

What do you think?
I still absolutely hate how long you do page pinning, it *will* wreck
things like CMA which are somewhat latency critical for silly things
like Android camera apps and who knows what else.

You've also forgotten about this:

  YcWutpu7BDeG+dQ2@hirez.programming.kicks-ass.net

That's not optional given how you're using page-pinning. Also, I think
we need at least one direct access to the page after getting the pin in
order to make it work.

That also very much limits it to Anon pages.
I can use the same mm/page pinning strategy as you do. But then our
patchsets will be quite similar, I guess, with the difference being
server wakeups with RUNNING workers vs "lazy" idle_server_tid_ptr. So
OK, let's continue with your approach. If you could post a new RFC
with the memory/paging fixes in it, I'll then add worker timeouts, as
I outlined in a separate email ~ 30min ago, and continue with my
integration/testing.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help