what is __get_cpu_var() ?

From: Dave Hylands <hidden>
Date: 2011-02-23 18:30:15

HI Murali,

On Wed, Feb 23, 2011 at 11:26 AM, Murali N [off-list ref] wrote:

Hi Dave,

On Wed, Feb 23, 2011 at 11:15 AM, Dave Hylands [off-list ref] wrote:

quoted

Hi Murali,

On Wed, Feb 23, 2011 at 10:34 AM, Murali N [off-list ref] wrote:

quoted

Hi Dave,
thanks for your reply.

...snip...

quoted

get_cpu_var returns the contents of a per-cpu variable.

__get_cpu_var contains the actual machine-dependant implementation. It
looks like all of the architectures use the one in
asm-generic/percpu.h

In general, all of the per-cpu data is gathered together into a
section. Multiple sections are allocated (one per CPU). I think that
the address of the variable is really the offset within the section,
and each allocated section is cache-line aligned. This offset is then
added to the "offset for my cpu" to come up with the final address of
the variable, which is dereferenced as a pointer dereference. There
are lots of extra doo-dads to get around warnings, and to prevent the
linker from producing relocation references for for the variable
access (since it looks like an access of a global variable, but it's
really just doing a game of using the offset of the variable within
the section).

So you could think of it as a very fancy offsetof macro.

There are several other macros involved, perhaps you could be a bit
more specific about your request?

Dave Hylands

I have one more basic question.
Why would we need to maintain structures like this? Is there any
advantage we get here?

Primarily for performance reasons. For example, the kernel maintains
lots of stats on threads and processes (I haven't looked to see if
these are actually maintained on a per-cpu basis, but the concept
applies). these stats are updated frequently, but only accessed
occaisonally. If you have a global "database" of stats, then each CPU
needs to lock the data, which creates lots of contention. By keeping
stuff per-cpu, the cpus don't need to acquire any locks (or at the
very least won't cause as much contention when acquiring per-cpu
locks). This becomes especially important when there are lots of cpus.

The query functions can then amalgamate the information and present it
as if it were maintained in a global database.

So if you have data which is updated frequently and only accessed
occaisonally, or updated infrequently and accessed frequently, then
you might have a case for using per-cpu-data. Of course you'd still
need to profile it and see if it makes sense.

Also keep in mind, that some things might not seem like it matters
much for say a dual-core, but could make a considerable difference
with say 32 cores.

Dave Hylands

So it make sense to use if i am running on more cores ( > 4 ).

It really depends on the access patterns of the data. Whether it makes
sense or not is something you'll probably need to profile (i.e. with
and without using per-cpu variables).

Dave Hylands

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help