Thread (32 messages) 32 messages, 3 authors, 2006-09-25

Re: [PATCH 5/7] Use %gs for per-cpu sections in kernel

From: Rusty Russell <hidden>
Date: 2006-09-23 08:55:58
Also in: lkml

On Sat, 2006-09-23 at 10:17 +0200, Andi Kleen wrote:
quoted
Mainly that it makes more sense to use the existing per-cpu concept than
introduce another kind of per-cpu var within a special structure, but
it's also more efficient (see other post).  Hopefully it will spark
What post exactly?  AFAIK it is the same code for common code.

The advantage of the PDA split is that the important variables which are 
in the PDA can be accessed with a single reference, while generic portable
per CPU data is the same as it was before. With your scheme even
the PDA accesses are at least two instructions, right? (I don't
think gcc/ld can resolve the per cpu section offset into a constant,
so it has to load them into a register first) 
No, now normal per-cpu accesses are 2 insn, per-cpu accesses using
arch-specific macros are 1 insn.  ie. it's as if every per-cpu variable
were in the "pda".

Here's the reply to Jeremy's query:

Jeremy says:
Or is the only percpu benefit you're getting from %gs is a slightly 
quicker way of getting the percpu_offset?  Does that help much?
In generic code, that's true (the arch-specific accessors can do it in 1
insn, not two).  But it's still a help.  This is __raw_get_cpu_var(x)
before:

   3:   89 e0                   mov    %esp,%eax
   5:   25 00 e0 ff ff          and    $0xffffe000,%eax
   a:   8b 40 08                mov    0x8(%eax),%eax
   d:   8b 04 85 00 00 00 00    mov    0x0(,%eax,4),%eax
                        10: R_386_32    __per_cpu_offset
  14:   8b 80 00 00 00 00       mov    0x0(%eax),%eax
                        16: R_386_32    per_cpu__x

And this is after:

  1f:   65 a1 00 00 00 00       mov    %gs:0x0,%eax
                        21: R_386_32    per_cpu__this_cpu_off
  25:   8b 80 00 00 00 00       mov    0x0(%eax),%eax
                        27: R_386_32    per_cpu__x

So we go from 5 instructions, 23 bytes, 3 memory references, to 2
instructions, 12 bytes, 2 memory references (although the extra mem ref
will almost certainly have been in cache).
quoted
interest in making dynamic-percpu pointers use the same offset scheme,
now x86 will experience the benefits.

And we might even get a third user of local_t!
I'm not holding my breath. I guess it was a nice idea before preemption
became popular ...
Well, since Xen doesn't support preemption, perhaps we'll convince
distros to turn it off again? 8)

Sorry for the confusion,
Rusty.
-- 
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help