Thread (18 messages) 18 messages, 5 authors, 2018-05-30

[RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets

From: Mike Leach <hidden>
Date: 2018-05-30 15:04:38
Also in: linux-doc, lkml

Hi Leo,



On 30 May 2018 at 10:45, Robert Walker [off-list ref] wrote:

On 28/05/18 23:13, Mathieu Poirier wrote:
quoted
Leo and/or Robert,

On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
quoted
Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly.

But the change has side effect for branch packet handling, it only
generate branch samples by checking previous packet flag
'last_instr_taken_branch' is true, this results in below three kinds
packets are missed to generate branch samples:

- The start tracing packet at the beginning of tracing data;
- The exception handling packet;
- If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
   for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
   that there have a discontinuity in the trace, on the other hand we
   also miss to generate proper branch sample for packets before and
   after CS_ETM_TRACE_ON packet.

This patch is to add branch sample handling for up three kinds packets:

- In function cs_etm__sample(), check if 'prev_packet->sample_type' is
   zero and in this case it generates branch sample for the start tracing
   packet; furthermore, we also need to handle the condition for
   prev_packet::end_addr is zero in the cs_etm__last_executed_instr();

- In function cs_etm__sample(), check if 'prev_packet->exc' is true and
   generate branch sample for exception handling packet;

- If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
   branch sample in the function cs_etm__flush(), this can save complete
   info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
   packet.  We also generate branch sample for the new CS_ETM_RANGE
   packet after CS_ETM_TRACE_ON packet, this have two purposes, the
   first one purpose is to save the info for the new CS_ETM_RANGE packet,
   the second purpose is to save CS_ETM_TRACE_ON packet info so we can
   have hint for a discontinuity in the trace.

   For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
   'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
   the decoder layer as dummy value.  This patch is to convert these
   values to zeros for more readable; this is accomplished by functions
   cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
   later one is a new function introduced by this patch.

Reviewed-by: Robert Walker <redacted>
Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces")
Signed-off-by: Leo Yan <redacted>
---
  tools/perf/util/cs-etm.c | 93
+++++++++++++++++++++++++++++++++++++-----------
  1 file changed, 73 insertions(+), 20 deletions(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..8418173 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,20 @@ static inline void
cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
  static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet
*packet)
  {
        /*
+        * The packet is the start tracing packet if the end_addr is
zero,
+        * returns 0 for this case.
+        */
+       if (!packet->end_addr)
+               return 0;

What is considered to be the "start tracing packet"?  Right now the only
two
kind of packets inserted in the decoder packet buffer queue are INST_RANGE
and
TRACE_ON.  How can we hit a condition where packet->end-addr == 0?

quoted
+
+       /*
+        * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
+        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
+        */
+       if (packet->end_addr == 0xdeadbeefdeadbeefUL)
+               return 0;

As it is with the above, I find triggering on addresses to be brittle and
hard
to maintain on the long run.  Packets all have a sample_type field that
should
be used in cases like this one.  That way we know exactly the condition
that is
targeted.

While working on this set, please spin-off another patch that defines
CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
numeral is used.  That way we stop using the hard coded value.
quoted
+
+       /*
         * The packet records the execution range with an exclusive end
address
         *
         * A64 instructions are constant size, so the last executed
@@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct
cs_etm_packet *packet)
        return packet->end_addr - A64_INSTR_SIZE;
  }
  +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet
*packet)
+{
+       /*
+        * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
+        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
+        */
+       if (packet->start_addr == 0xdeadbeefdeadbeefUL)
+               return 0;

Same comment as above.
quoted
+
+       return packet->start_addr;
+}
+
  static inline u64 cs_etm__instr_count(const struct cs_etm_packet
*packet)
  {
        /*
@@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct
cs_etm_queue *etmq)
        be       = &bs->entries[etmq->last_branch_pos];
        be->from = cs_etm__last_executed_instr(etmq->prev_packet);
-       be->to   = etmq->packet->start_addr;
+       be->to   = cs_etm__first_executed_instr(etmq->packet);
        /* No support for mispredict */
        be->flags.mispred = 0;
        be->flags.predicted = 1;
@@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct
cs_etm_queue *etmq)
        sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
        sample.pid = etmq->pid;
        sample.tid = etmq->tid;
-       sample.addr = etmq->packet->start_addr;
+       sample.addr = cs_etm__first_executed_instr(etmq->packet);
        sample.id = etmq->etm->branches_id;
        sample.stream_id = etmq->etm->branches_id;
        sample.period = 1;
@@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue
*etmq)
                etmq->period_instructions = instrs_over;
        }
  -     if (etm->sample_branches &&
-           etmq->prev_packet &&
-           etmq->prev_packet->sample_type == CS_ETM_RANGE &&
-           etmq->prev_packet->last_instr_taken_branch) {
-               ret = cs_etm__synth_branch_sample(etmq);
-               if (ret)
-                       return ret;
+       if (etm->sample_branches && etmq->prev_packet) {
+               bool generate_sample = false;
+
+               /* Generate sample for start tracing packet */
+               if (etmq->prev_packet->sample_type == 0 ||

What kind of packet is sample_type == 0 ?
quoted
+                   etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+                       generate_sample = true;
+
+               /* Generate sample for exception packet */
+               if (etmq->prev_packet->exc == true)
+                       generate_sample = true;

Please don't do that.  Exception packets have a type of their own and can
be
added to the decoder packet queue the same way INST_RANGE and TRACE_ON
packets
are.  Moreover exception packet containt an address that, if I'm reading
the
documenation properly, can be used to keep track of instructions that were
executed between the last address of the previous range packet and the
address
executed just before the exception occurred.  Mike and Rob will have to
confirm
this as the decoder may be doing all that hard work for us.
clarification on the exception packets....

The Opencsd output exception packet gives you the exception number,
and optionally the preferred return address. If this address is
present does depend a lot on the underlying protocol - will normally
be there with ETMv4.
Exceptions are marked differently in the underlying protocol - the
OCSD packets abstract away these differences.

consider the code:

0x1000: <some instructions>
0x1100: BR 0x2000
....
0x2000: <some instructions>
0x2020  BZ r4

Without an exception this would result in the packets

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider an exception occurring before the BR 0x2000

this will result in:-
OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_EXCEPTION_RETURN()  // present if exception returns are
explicitly marked in underlying trace - may not always be depending on
circumstances.
OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
range - just the branch
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider the exception occurring after the BR, but before any
other instructions are executed.

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_EXECEPTION(IRQ, ret-addr 0x2000)     // here the preferred return
address is actually the target of the branch.
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

So in general it is possible to arrive in the IRQ_START range with the
previous packet having been either a taken branch, a not taken branch,
or not a branch.
Care must be taken - whether AutoFDO or normal trace disassembly not
to assume that having the last range packet as a taken branch means
that the next range packet is the target, if there is an intervening
exception packet.

Regards

Mike
quoted
quoted
+
+               /* Generate sample for normal branch packet */
+               if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+                   etmq->prev_packet->last_instr_taken_branch)
+                       generate_sample = true;
+
+               if (generate_sample) {
+                       ret = cs_etm__synth_branch_sample(etmq);
+                       if (ret)
+                               return ret;
+               }
        }
        if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue
*etmq)
  static int cs_etm__flush(struct cs_etm_queue *etmq)
  {
        int err = 0;
+       struct cs_etm_auxtrace *etm = etmq->etm;
        struct cs_etm_packet *tmp;
  -     if (etmq->etm->synth_opts.last_branch &&
-           etmq->prev_packet &&
-           etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+       if (!etmq->prev_packet)
+               return 0;
+
+       if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
+               return 0;
+
+       if (etmq->etm->synth_opts.last_branch) {

If you add:

         if (!etmq->etm->synth_opts.last_branch)
                 return 0;

You can avoid indenting the whole block.
quoted
                /*
                 * Generate a last branch event for the branches left in
the
                 * circular buffer at the end of the trace.
@@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
                err = cs_etm__synth_instruction_sample(
                        etmq, addr,
                        etmq->period_instructions);
+               if (err)
+                       return err;
                etmq->period_instructions = 0;
+       }
  -             /*
-                * Swap PACKET with PREV_PACKET: PACKET becomes
PREV_PACKET for
-                * the next incoming packet.
-                */
-               tmp = etmq->packet;
-               etmq->packet = etmq->prev_packet;
-               etmq->prev_packet = tmp;
+       if (etm->sample_branches) {
+               err = cs_etm__synth_branch_sample(etmq);
+               if (err)
+                       return err;
        }
  -     return err;
+       /*
+        * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+        * the next incoming packet.
+        */
+       tmp = etmq->packet;
+       etmq->packet = etmq->prev_packet;
+       etmq->prev_packet = tmp;

Robert, I remember noticing that when you first submitted the code but
forgot to
go back to it.  What is the point of swapping the packets?  I understand

etmq->prev_packet = etmq->packet;

But not

etmq->packet = tmp;

After all etmq->packet will be clobbered as soon as
cs_etm_decoder__get_packet()
is called, which is alwasy right after either cs_etm__sample() or
cs_etm__flush().
This is code I inherited from the original versions of these patches, but it
works because:
- etmq->packet and etmq->prev_packet are pointers to struct cs_etm_packet
allocated by zalloc() in cs_etm__alloc_queue()
- cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet and
copies the contents of the first packet from the queue into the passed
location with:
   *packet = decoder->packet_buffer[decoder->head]

So the swap code is only swapping the pointers over, not the contents of the
packets.

Regards

Rob


quoted
Thanks,
Mathieu


quoted
+       return 0;
  }
    static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
--
2.7.4


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help