Re: [PATCH v7 01/15] sched/core: uclamp: Add CPU's clamp buckets refcounting
From: Patrick Bellasi <hidden>
Date: 2019-03-14 12:13:23
Also in:
linux-pm, lkml
On 13-Mar 20:48, Peter Zijlstra wrote:
On Wed, Mar 13, 2019 at 04:12:29PM +0000, Patrick Bellasi wrote:quoted
On 13-Mar 14:40, Peter Zijlstra wrote:quoted
On Fri, Feb 08, 2019 at 10:05:40AM +0000, Patrick Bellasi wrote:quoted
+static inline unsigned int uclamp_bucket_id(unsigned int clamp_value) +{ + return clamp_value / UCLAMP_BUCKET_DELTA; +} + +static inline unsigned int uclamp_bucket_value(unsigned int clamp_value) +{ + return UCLAMP_BUCKET_DELTA * uclamp_bucket_id(clamp_value);return clamp_value - (clamp_value % UCLAMP_BUCKET_DELTA); might generate better code; just a single division, instead of a div and mult.Wondering if compilers cannot do these optimizations... but yes, looks cool and will do it in v8, thanks.I'd be most impressed if they pull this off. Check the generated code and see I suppose :-)
On x86 the code generated looks exactly the same: https://godbolt.org/z/PjmA7k While on on arm64 it seems the difference boils down to: - one single "mul" instruction vs - two instructions: "sub" _plus_ one "multiply subtract" https://godbolt.org/z/0shU0S So, if I din't get something wrong... perhaps the original version is even better, isn't it? Test code: ---8<--- #define UCLAMP_BUCKET_DELTA 52 static inline unsigned int uclamp_bucket_id(unsigned int clamp_value) { return clamp_value / UCLAMP_BUCKET_DELTA; } static inline unsigned int uclamp_bucket_value1(unsigned int clamp_value) { return UCLAMP_BUCKET_DELTA * uclamp_bucket_id(clamp_value); } static inline unsigned int uclamp_bucket_value2(unsigned int clamp_value) { return clamp_value - (clamp_value % UCLAMP_BUCKET_DELTA); } int test1(int argc, char *argv[]) { return uclamp_bucket_value1(argc); } int test2(int argc, char *argv[]) { return uclamp_bucket_value2(argc); } int test3(int argc, char *argv[]) { return uclamp_bucket_value1(argc) - uclamp_bucket_value2(argc); } ---8<--- which gives on arm64: ---8<--- test1: mov w1, 60495 movk w1, 0x4ec4, lsl 16 umull x0, w0, w1 lsr x0, x0, 36 mov w1, 52 mul w0, w0, w1 ret test2: mov w1, 60495 movk w1, 0x4ec4, lsl 16 umull x1, w0, w1 lsr x1, x1, 36 mov w2, 52 msub w1, w1, w2, w0 sub w0, w0, w1 ret test3: mov w0, 0 ret ---8<--- -- #include <best/regards.h> Patrick Bellasi