Thread (16 messages) 16 messages, 4 authors, 2018-02-05

Re: [PATCH v3 1/3] sched/fair: add util_est on top of PELT

From: Patrick Bellasi <hidden>
Date: 2018-02-05 17:49:24
Also in: lkml

On 30-Jan 15:01, Peter Zijlstra wrote:
On Tue, Jan 30, 2018 at 02:04:32PM +0100, Peter Zijlstra wrote:
quoted
On Tue, Jan 30, 2018 at 12:46:33PM +0000, Patrick Bellasi wrote:
quoted
quoted
Aside from that being whitespace challenged, did you also try:

	if ((unsigned)((util_est - util_last) + LIM - 1) < (2 * LIM - 1))
No, since the above code IMO is so much "easy to parse for humans" :)
Heh, true. Although that's fixable by wrapping it in some helper with a
comment.
quoted
But, mainly because since the cache alignment update, also while testing on a
"big" Intel machine I cannot see regressions on hackbench.

This is the code I get on my Xeon E5-2690 v2:

       if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100))
   6ba0:       8b 86 7c 02 00 00       mov    0x27c(%rsi),%eax
   6ba6:       48 29 c8                sub    %rcx,%rax
   6ba9:       48 99                   cqto
   6bab:       48 31 d0                xor    %rdx,%rax
   6bae:       48 29 d0                sub    %rdx,%rax
   6bb1:       48 83 f8 0a             cmp    $0xa,%rax
   6bb5:       7e 1d                   jle    6bd4 <dequeue_task_fair+0x7e4>

Does it look so bad?
Its not terrible, and I think your GCC is far more clever than the one I
To clarify; my GCC at the time generated conditional branches to compute
the absolute value; and in that case the thing I proposed wins hands
down because its unconditional.

However the above is also unconditional and then the difference is much
less important.
I've finally convinced myself that we can live with the "parsing
complexity" of your proposal... and wrapped into an inline it turned
out to be not so bad.
quoted
used at the time. But that's 4 dependent instructions (cqto,xor,sub,cmp)
whereas the one I proposed uses only 2 (add,cmp).
The ARM64 generated code is also simpler.
quoted
Now, my proposal is, as you say, somewhat hard to read, and it also
doesn't work right when our values are 'big' (which they will not be in
our case, because util has a very definite bound), and I suspect you're
right that ~2 cycles here will not be measurable.
Indeed, I cannot see noticeable differences if not just a slightly
improvement...
quoted
So yeah.... whatever ;-)
... I'm going to post a v4 using your proposal ;-)

Thanks Patrick

-- 
#include <best/regards.h>

Patrick Bellasi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help