Thread (18 messages) 18 messages, 5 authors, 2011-06-03

Re: [BUG] rebuild_sched_domains considered dangerous

From: Martin Schwidefsky <hidden>
Date: 2011-03-09 13:31:55
Also in: lkml

On Wed, 09 Mar 2011 14:19:29 +0100
Peter Zijlstra [off-list ref] wrote:
On Wed, 2011-03-09 at 14:15 +0100, Martin Schwidefsky wrote:
quoted
On Wed, 09 Mar 2011 12:33:49 +0100
Peter Zijlstra [off-list ref] wrote:
quoted
On Wed, 2011-03-09 at 11:19 +0100, Peter Zijlstra wrote:
quoted
quoted
It appears that this corresponds to one CPU deciding to rebuild the
sched domains. There's various reasons why that can happen, the typical
one in our case is the new VPNH feature where the hypervisor informs us
of a change in node affinity of our virtual processors. s390 has a
similar feature and should be affected as well.
Ahh, so that's triggering it :-), just curious, how often does the HV do
that to you? 
OK, so Ben told me on IRC this can happen quite frequently, to which I
must ask WTF were you guys smoking? Flipping the CPU topology every time
the HV scheduler does something funny is quite insane. And you did that
without ever talking to the scheduler folks, not cool.

That is of course aside from the fact that we have a real bug there that
needs fixing, but really guys, WTF!
Just for info, on s390 the topology change events are rather infrequent.
They do happen e.g. after an LPAR has been activated and the LPAR
hypervisor needs to reshuffle the CPUs of the different nodes.
But if you don't also update the cpu->node memory mappings (which I
think it near impossible) what good is it to change the scheduler
topology?
The memory for the different LPARs is striped over all nodes (or books as we
call them). We heavily rely on the large shared cache between the books to hide
the different memory access latencies.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help