Thread (15 messages) 15 messages, 3 authors, 2004-08-27

Re: [Lhms-devel] [RFC] buddy allocator without bitmap [2/4]

From: Hiroyuki KAMEZAWA <hidden>
Date: 2004-08-27 04:43:26

Hi,
I testd set_bit()/__set_bit() ops, atomic and non atomic ops, on my Xeon.
I think this test is not perfect, but shows some aspect of pefromance of atomic ops.

Program:
the program touches memory in tight loop, using atomic and non-atomic set_bit().
memory size is 512k, L2 cache size.
I attaches it in this mail, but it is configured to my Xeon and looks ugly :).


My CPU:
from /proc/cpuinfo
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) MP CPU 1.90GHz
stepping        : 2
cpu MHz         : 1891.582
cache size      : 512 KBCPU     : Intel Xeon 1.8GHz

Result:
[root@kanex2 atomic]# nice -10 ./test-atomics
score 0 is            64011 note: cache hit, no atomic
score 1 is           543011 note: cache hit, atomic
score 2 is           303901 note: cache hit, mixture
score 3 is           344261 note: cache miss, no atomic
score 4 is          1131085 note: cache miss, atomic
score 5 is           593443 note: cache miss, mixture
score 6 is           118455 note: cache hit, dependency, noatomic
score 7 is           416195 note: cache hit, dependency, mixture

smaller score is better.
score 0-2 shows set_bit/__set_bit performance during good cache hit rate.
score 3-5 shows set_bit/__set_bit performance during bad cache hit rate.
score 6-7 shows set_bit/__set_bit performance during good cache hit
but there is data dependency on each access in the tight loop.

To Dave:
cost of prefetch() is not here, because I found it is very sensitive to
what is done in the loop and difficult to measure in this program.
I found cost of calling prefetch is a bit high, I'll measure whether
prefetch() in buddy allocator is good or bad again.

I think this result shows I should use non-atomic ops when I can.

Thanks.
Kame

Hiroyuki KAMEZAWA wrote:

Okay, I'll do more test and if I find atomic ops are slow,
I'll add __XXXPagePrivate() macros.

ps. I usually test codes on Xeon 1.8G x 2 server.

-- Kame

Andrew Morton wrote:
quoted
Hiroyuki KAMEZAWA [off-list ref] wrote:
quoted
In the previous version, I used 
SetPagePrivate()/ClearPagePrivate()/PagePrivate().
But these are "atomic" operation and looks very slow.
This is why I doesn't used these macros in this version.

My previous version, which used set_bit/test_bit/clear_bit, shows 
very bad performance
on my test, and I replaced it.


That's surprising.  But if you do intend to use non-atomic bitops then
please add __SetPagePrivate() and __ClearPagePrivate()

-- 
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki [off-list ref]

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help