Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's... | linux-pm

[PATCH v4 00/16] Add utilization clamping support · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping · Juri Lelli <juri.lelli@redhat.com> · 2018-09-05
[PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Juri Lelli <juri.lelli@redhat.com> · 2018-09-05
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-06
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Juri Lelli <juri.lelli@redhat.com> · 2018-09-06
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Juri Lelli <juri.lelli@redhat.com> · 2018-09-06
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-06
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Suren Baghdasaryan <surenb@google.com> · 2018-09-08
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-13
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-14
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Peter Zijlstra <peterz@infradead.org> · 2018-09-13
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-14
[PATCH v4 15/16] sched/core: uclamp: add clamp group discretization support · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 13/16] sched/core: uclamp: use percentage clamp values · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller · Randy Dunlap <hidden> · 2018-08-28
Re: [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller · Patrick Bellasi <hidden> · 2018-08-29
[PATCH v4 16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Juri Lelli <juri.lelli@redhat.com> · 2018-09-04
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-06
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Juri Lelli <juri.lelli@redhat.com> · 2018-09-06
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-06
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-14
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-14
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-14
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-17
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-21
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-24
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-24
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-24
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-24
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-24
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Peter Zijlstra <peterz@infradead.org> · 2018-09-25
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-26
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Quentin Perret <hidden> · 2018-09-27
Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default · Patrick Bellasi <hidden> · 2018-09-26
[PATCH v4 10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 05/16] sched/core: uclamp: enforce last task UCLAMP_MAX · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Suren Baghdasaryan <surenb@google.com> · 2018-09-09
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Suren Baghdasaryan <surenb@google.com> · 2018-09-12
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Tejun Heo <tj@kernel.org> · 2018-09-11
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Patrick Bellasi <hidden> · 2018-09-11
Re: [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps · Tejun Heo <tj@kernel.org> · 2018-09-11
[PATCH v4 11/16] sched/core: uclamp: add system default clamps · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 11/16] sched/core: uclamp: add system default clamps · Suren Baghdasaryan <surenb@google.com> · 2018-09-10
Re: [PATCH v4 11/16] sched/core: uclamp: add system default clamps · Patrick Bellasi <hidden> · 2018-09-11
Re: [PATCH v4 11/16] sched/core: uclamp: add system default clamps · Suren Baghdasaryan <surenb@google.com> · 2018-09-11
[PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Peter Zijlstra <peterz@infradead.org> · 2018-09-14
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Patrick Bellasi <hidden> · 2018-09-14
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Peter Zijlstra <peterz@infradead.org> · 2018-09-14
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Patrick Bellasi <hidden> · 2018-09-14
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks · Quentin Perret <hidden> · 2018-09-27
[PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups · Suren Baghdasaryan <surenb@google.com> · 2018-09-09
Re: [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups · Suren Baghdasaryan <surenb@google.com> · 2018-09-12
[PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes · Patrick Bellasi <hidden> · 2018-08-28
[PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Patrick Bellasi <hidden> · 2018-08-28
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Peter Zijlstra <peterz@infradead.org> · 2018-09-12
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Patrick Bellasi <hidden> · 2018-09-12
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Peter Zijlstra <peterz@infradead.org> · 2018-09-13
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Patrick Bellasi <hidden> · 2018-09-14
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Peter Zijlstra <peterz@infradead.org> · 2018-09-14
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting · Patrick Bellasi <hidden> · 2018-09-14

Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups

From: Peter Zijlstra <peterz@infradead.org>
Date: 2018-09-12 16:12:34
Also in: lkml

On Wed, Sep 12, 2018 at 04:56:19PM +0100, Patrick Bellasi wrote:

On 12-Sep 15:49, Peter Zijlstra wrote:

quoted

On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote:

quoted

+/**
+ * uclamp_map: reference counts a utilization "clamp value"
+ * @value:    the utilization "clamp value" required
+ * @se_count: the number of scheduling entities requiring the "clamp value"
+ * @se_lock:  serialize reference count updates by protecting se_count

Why do you have a spinlock to serialize a single value? Don't we have
atomics for that?

There are some code paths where it's used to protect clamp groups
mapping and initialization, e.g.

      uclamp_group_get()
          spin_lock()
              // initialize clamp group (if required) and then...
              se_count += 1
          spin_unlock()

Almost all these paths are triggered from user-space and protected
by a global uclamp_mutex, but fork/exit paths.

To serialize these paths I'm using the spinlock above, does it make
sense ? Can we use the global uclamp_mutex on forks/exit too ?

OK, then your comment is misleading; it serializes both fields.

One additional observations is that, if in the future we want to add a
kernel space API, (e.g. driver asking for a new clamp value), maybe we
will need to have a serialized non-sleeping uclamp_group_get() API ?

No idea; but if you want to go all fancy you can replace he whole
uclamp_map thing with something like:

struct uclamp_map {
	union {
		struct {
			unsigned long v : 10;
			unsigned long c : BITS_PER_LONG - 10;
		};
		atomic_long_t s;
	};
};

And use uclamp_map::c == 0 as unused (as per normal refcount
semantics) and atomic_long_cmpxchg() the whole thing using
uclamp_map::s.

quoted

+ *                          uclamp_maps is a matrix of
+ *          +------- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries
+ *          |                                |
+ *          |                /---------------+---------------\
+ *          |               +------------+       +------------+
+ *          |  / UCLAMP_MIN | value      |       | value      |
+ *          |  |            | se_count   |...... | se_count   |
+ *          |  |            +------------+       +------------+
+ *          +--+            +------------+       +------------+
+ *             |            | value      |       | value      |
+ *             \ UCLAMP_MAX | se_count   |...... | se_count   |
+ *                          +-----^------+       +----^-------+
+ *                                |                   |
+ *                      uc_map =  +                   |
+ *                     &uclamp_maps[clamp_id][0]      +
+ *                                                clamp_value =
+ *                                       uc_map[group_id].value
+ */
+static struct uclamp_map uclamp_maps[UCLAMP_CNT]
+				    [CONFIG_UCLAMP_GROUPS_COUNT + 1]
+				    ____cacheline_aligned_in_smp;
+

I'm still completely confused by all this.

sizeof(uclamp_map) = 12

that array is 2*6=12 of those, so the whole thing is 144 bytes. which is
more than 2 (64 byte) cachelines.

This data structure is *not* used in the hot-path, that's why I did not
care about fitting it exactly into few cache lines.

It's used to map a user-space "clamp value" into a kernel-space "clamp
group" when user-space:
 - changes a task specific clamp value
 - changes a cgroup clamp value
 - a task forks/exits

I assume we can consider all those as "slow" code paths, is that correct ?

Yep.

quoted

What's the purpose of that cacheline align statement?

In uclamp_maps, we still need to scan the array when a clamp value is
changed from user-space, i.e. the cases reported above. Thus, that
alignment is just to ensure that we minimize the number of cache lines
used. Does that make sense ?

Maybe that alignment implicitly generated by the compiler ?

It is not, but if it really is a slow path, we shouldn't care about
alignment.

quoted

Note that without that apparently superfluous lock, it would be 8*12 =
96 bytes, which is 1.5 lines and would indeed suggest you default to
GROUP_COUNT=7 by default to fill 2 lines.

Yes, will check better if we can count on just the uclamp_mutex

Well, if we don't care about performance (slow path) then keeping he
lock is fine, just the comment and alignment are misleading.

quoted

Why are the min and max things torn up like that?  I'm fairly sure I
asked some of that last time; but the above comments only try to explain
what, not why.

We use that organization to speedup scanning for clamp values of the
same clamp_id. That's more important in the hot-path than above, where
we need to scan struct uclamp_cpu when a new aggregated clamp value
has to be computed. This is done in:

   [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting
   https://lore.kernel.org/lkml/20180828135324.21976-4-patrick.bellasi@arm.com/ (local)

Specifically:

   dequeue_task()
     uclamp_cpu_put()
       uclamp_cpu_put_id(clamp_id)
         uclamp_cpu_update(clamp_id)
            // Here we have an array scan by clamp_id

With the given data layout I reported above, when we update the
min_clamp value (boost) we have all the data required in a single
cache line.

If that makes sense, I can certainly improve the comment above to
justify its layout.

OK, let me read on.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help