Thread (5 messages) 5 messages, 3 authors, 2021-11-30

Re: [PATCH v2] perf metric: Reduce multiplexing with duration_time

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: 2021-11-30 19:24:40
Also in: lkml

Em Tue, Nov 30, 2021 at 07:11:36PM +0100, Jiri Olsa escreveu:
On Mon, Nov 29, 2021 at 09:46:31AM -0800, Ian Rogers wrote:
quoted
On Sun, Nov 28, 2021 at 8:23 AM Jiri Olsa [off-list ref] wrote:
quoted
On Tue, Nov 23, 2021 at 05:52:26PM -0800, Ian Rogers wrote:
quoted
It is common to use the same counters with and without
duration_time. The ID sharing code treats duration_time as if it
were a hardware event placed in the same group. This causes
unnecessary multiplexing such as in the following example where
l3_cache_access isn't shared:

$ perf stat -M l3 -a sleep 1

 Performance counter stats for 'system wide':

         3,117,007      l3_cache_miss             #    199.5 MB/s  l3_rd_bw
                                                  #     43.6 %  l3_hits
                                                  #     56.4 %  l3_miss               (50.00%)
         5,526,447      l3_cache_access                                               (50.00%)
         5,392,435      l3_cache_access           # 5389191.2 access/s  l3_access_rate  (50.00%)
     1,000,601,901 ns   duration_time

       1.000601901 seconds time elapsed

Fix this by placing duration_time in all groups unless metric
sharing has been disabled on the command line:

$ perf stat -M l3 -a sleep 1

 Performance counter stats for 'system wide':

         3,597,972      l3_cache_miss             #    230.3 MB/s  l3_rd_bw
                                                  #     48.0 %  l3_hits
                                                  #     52.0 %  l3_miss
         6,914,459      l3_cache_access           # 6909935.9 access/s  l3_access_rate
     1,000,654,579 ns   duration_time

       1.000654579 seconds time elapsed

$ perf stat --metric-no-merge -M l3 -a sleep 1

 Performance counter stats for 'system wide':

         3,501,834      l3_cache_miss             #     53.5 %  l3_miss               (24.99%)
         6,548,173      l3_cache_access                                               (24.99%)
         3,417,622      l3_cache_miss             #     45.7 %  l3_hits               (25.04%)
         6,294,062      l3_cache_access                                               (25.04%)
         5,923,238      l3_cache_access           # 5919688.1 access/s  l3_access_rate  (24.99%)
     1,000,599,683 ns   duration_time
         3,607,486      l3_cache_miss             #    230.9 MB/s  l3_rd_bw           (49.97%)

       1.000599683 seconds time elapsed

v2. Doesn't count duration_time in the metric_list_cmp function that
    sorts larger metrics first. Without this a metric with duration_time
    and an event is sorted the same as a metric with two events,
    possibly not allowing the first metric to share with the second.
hum, isn't the change about adding duration_time in every metric?
or you could still end up with metric without duration_time
It is about adding duration_time to all metrics. Sorting of the
metrics by number of IDs happens before we insert duration_time which
happens just prior to parsing. duration_time needn't be inserted if
--metric-no-merge is passed.
I see, so that sorting takes place before it's added, makes sense then

Acked-by: Jiri Olsa <redacted>
Thanks, applied.

- Arnaldo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help