Re: [RFC] AutoNUMA alpha6
From: Andrea Arcangeli <hidden>
Date: 2012-03-21 12:50:31
Also in:
lkml
Hi Dan, On Tue, Mar 20, 2012 at 09:01:58PM -0700, Dan Smith wrote:
AA> upstream autonuma numasched hard inverse AA> numa02 64 45 66 42 81 AA> numa01 491 328 607 321 623 -D THREAD_ALLOC AA> numa01 305 207 338 196 378 -D NO_BIND_FORCE_SAME_NODE AA> So give me a break... you must have made a real mess in your AA> benchmarking. I'm just running what you posted, dude :)
Apologies if it felt like I was attacking you, that wasn't my intention, I actually appreciate your effort! My exclamation was because I was shocked by the staggering difference in results, nothing else. Here I still get the results I posted above from numasched. In fact even worse, now even -D THREAD_ALLOC wouldn't end (and I disabled lockdep just in case), I'll try to reboot some more time to see if I can get some number out of it again. numa02 at least repeats at 66 sec reproducibly with numasched with or without lockdep.
AA> numasched is always doing worse than upstream here, in fact two
AA> times massively worse. Almost as bad as the inverse binds.
Well, something clearly isn't right, because my numbers don't match
yours at all. This time with THP disabled, and compared to the rest of
the numbers from my previous runs:
autonuma HARD INVERSE NO_BIND_FORCE_SAME_MODE
numa01 366 335 356 377
numa01THP 388 336 353 399
That shows that autonuma is worse than inverse binds here. If I'm
running your stuff incorrectly, please tell me and I'll correct
it. However, I've now compiled the binary exactly as you asked, with THP
disabled, and am seeing surprisingly consistent results.HARD and INVERSE should be the min and max you get. I would ask you before you test AutoNUMA again, or numasched again, to repeat this "HARD" vs "INVERSE" vs "NO_BIND_FORCE_SAME_MODE" benchmark and be sure the above numbers are correct for the above three cases. On my hardware you can see on page 7 of my pdf what I get: http://www.kernel.org/pub/linux/kernel/people/andrea/autonuma/autonuma_bench-20120321.pdf numa01 -DHARD_BIND | -DNO_BIND_FORCE_SAME_NODE | -DINVERSE_BIND 196 305 378 You can do this benchmark on an upstream kernel 3.3-rc, no need of any patch to collect the above three numbers. For me this is always true: HARD_BIND <= NO_BIND_FORCE_SAME_NODE <= INVERSE_BIND. Checking if numa01 HARD_BIND and INVERSE_BIND cases are setting up your hardware topology correctly may be good idea too. If it's not a benchmarking error or a topology error in HARD_BIND/INVERSE_BIND, it may be the hardware you're using is very different. That would be bad news though, I thought you were using the same common 2 socket exacore setup that I'm using and I wouldn't have expected such a staggering difference in results (even for HARD vs INVERSE vs NO_BIND_FORCE_SAME_NODE, even before we put autonuma or numasched into the equation).
AA> Maybe you've more than 16g? I've 16G and that leaves 1G free on both AA> nodes at the peak load with AutoNUMA. That shall be enough for AA> numasched too (Peter complained me I waste 80MB on a 16G system, so AA> he can't possibly be intentionally wasting me 2GB). Yep, 24G here. Do I need to tweak the test?
Well maybe you could try to repeat at 16G if you still see numasched performing great after running it with -DNO_BIND_FORCE_SAME_MODE. What -DNO_BIND_FORCE_SAME_MODE is meant to do, is to start the "NUMA migration" races from the worst possible condition. Imagine it like doing a hiking race consistently always from the _bottom_ of the mountain, and not randomly from the middle like it would happen without -DNO_BIND_FORCE_SAME_MODE.
How do you figure? I didn't post any hard binding numbers. In fact, numasched performed about equal to hard binding...definitely within your stated 2% error interval. That was with THP enabled, tomorrow I'll be glad to run them all again without THP.
Again thanks so much for your effort. I hope others will run more benchmarks too on both solution. And I repeat what I said yesterday clear and stright: if numasched will be shown to have the lead on the vast majority of workloads, I will be happy to "rm -r autonuma" to stop wasting time on an inferior dead project, and work on something else entirely or to contribute to numasched in case they will need help for something. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>