Re: [PATCH] memcg: remove unneeded preempt_disable

From: Peter Zijlstra <peterz@infradead.org>
Date: 2011-08-25 18:40:40
Also in: linux-arch, lkml

On Thu, 2011-08-25 at 11:31 -0500, Christoph Lameter wrote:

On Thu, 25 Aug 2011, James Bottomley wrote:

quoted

On Thu, 2011-08-25 at 10:11 -0500, Christoph Lameter wrote:

quoted

On Thu, 25 Aug 2011, Peter Zijlstra wrote:

quoted

On Thu, 2011-08-18 at 14:40 -0700, Andrew Morton wrote:

quoted

I think I'll apply it, as the call frequency is low (correct?) and the
problem will correct itself as other architectures implement their
atomic this_cpu_foo() operations.

Which leads me to wonder, can anything but x86 implement that this_cpu_*
muck? I doubt any of the risk chips can actually do all this.
Maybe Itanic, but then that seems to be dying fast.

The cpu needs to have an RMW instruction that does something to a
variable relative to a register that points to the per cpu base.

Thats generally possible. The problem is how expensive the RMW is going to
be.

Risc systems generally don't have a single instruction for this, that's
correct.  Obviously we can do it as a non atomic sequence: read
variable, compute relative, read, modify, write ... but there's
absolutely no point hand crafting that in asm since the compiler can
usually work it out nicely.  And, of course, to have this atomic, we
have to use locks, which ends up being very expensive.

ARM seems to have these LDREX/STREX instructions for that purpose which
seem to be used for generating atomic instructions without lockes. I guess
other RISC architectures have similar means of doing it?

Even with LL/SC and the CPU base in a register you need to do something
like:

again:
	LL $target-reg, $cpubase-reg + offset
	<foo>
	SC $ret, $target-reg, $cpubase-reg + offset
	if !$ret goto again

Its the +offset that's problematic, it either doesn't exist or is very
limited (a quick look at the MIPS instruction set gives a limit of 64k).

Without the +offset you need:

again:
	$tmp-reg = $cpubase-reg
	$tmp-reg += offset;

	LL $target-reg, $tmp-reg
	<foo>
	SC $ret, $target-reg, $tmp-reg
	if !$ret goto again


Which is wide open to migration races. Also, very often there are
constraints on LL/SC that mandate we use preempt_disable/enable around
its use, which pretty much voids the whole purpose, since if we disable
preemption we might as well just use C (ARM belongs in this class).

It does look POWERPC's lwarx/stwcx is sane enough, although the
instruction reference I found doesn't list what happens if the LL/SC
doesn't use the same effective address or has other loads/stores in
between, if its ok with those and simply fails the SC it should be good.

Still, creating atomic ops for per-cpu ops might be more expensive than
simply doing the preempt-disable/rmw/enable dance, dunno don't know
these archs that well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help