Re: [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples
From: Mike Leach <hidden>
Date: 2020-02-05 16:09:16
Also in:
lkml
Hi Leo, There are a couple of typos in the comments below, but I also believe that the sample loop could be considerably simplified On Mon, 3 Feb 2020 at 01:52, Leo Yan [off-list ref] wrote:
quoted hunk ↗ jump to hunk
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic. Let's see below flow as an example: - If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values: tidq->period_instructions = 0 etm->instructions_sample_period = 4 - When the first packet is coming: packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below: tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6 - When the second packet is coming: packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again: tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12 So after handle these two packets, there have below issues: The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer. The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet. This patch fixes the logic in function cs_etm__sample(); the basic idea is to divide into three parts for handling coming packet: - The first part is for synthesizing the first instruction sample, it combines the instructions from the tail of previous packet and the instructions from the head of the new packet; - The second part is to simply generate samples with sample period aligned; - The third part is the tail of new packet, the rest instructions will be left for the sequential sample handling. Suggested-by: Mike Leach <redacted> Signed-off-by: Leo Yan <redacted> --- tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++----- 1 file changed, 92 insertions(+), 13 deletions(-)diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3e28462609e7..c5a05f728eac 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c@@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq, * TODO: allow period to be defined in cycles and clock time */ - /* Get number of instructions executed after the sample point */ - u64 instrs_over = tidq->period_instructions - - etm->instructions_sample_period; + /* + * Below diagram demonstrates the instruction samples + * generation flows: + * + * Instrs Instrs Instrs Instrs + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3) + * | | | | + * V V V V + * -------------------------------------------------- + * ^ ^ + * | | + * Period Period + * instructions(Pi) instructions(Pi') + * + * | | + * \---------------- -----------------/ + * V + * instrs_executed + * + * Period instructions (Pi) contains the the number of + * instructions executed after the sample point(n). When a new + * instruction packet is coming and generate for the next sample + * (n+1), it combines with two parts instructions, one is the + * tail of the old packet and another is the head of the new + * coming packet. So 'head' variable is used to cauclate the
typo : s/cauclate/calculate
+ * instruction numbers in the new packet for sample(n+1). + * + * Sample(n+2) and sample(n+3) consume the instructions with + * sample period, so directly generate samples based on the + * sampe period. + *
typo: s/sampe/sample
+ * After sample(n+3), the rest instructions will be used by
+ * later packet; so use 'instrs_over' to track the rest
+ * instruction number and it is assigned to
+ * 'tidq->period_instructions' for next round calculation.
+ */
+ u64 head, offset = 0;
+ u64 addr;
/*
- * Calculate the address of the sampled instruction (-1 as
- * sample is reported as though instruction has just been
- * executed, but PC has not advanced to next instruction)
+ * 'instrs_over' is the number of instructions executed after
+ * sample points, initialise it to 'instrs_executed' and will
+ * decrease it for consumed instructions in every synthesized
+ * instruction sample.
*/
- u64 offset = (instrs_executed - instrs_over - 1);
- u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
- tidq->packet, offset);
+ u64 instrs_over = instrs_executed;
- ret = cs_etm__synth_instruction_sample(
- etmq, tidq, addr, etm->instructions_sample_period);
- if (ret)
- return ret;
+ /*
+ * 'head' is the instructions number of the head in the new
+ * packet, it combines with the tail of previous packet to
+ * generate a sample. So 'head' uses the sample period to
+ * decrease the instruction number introduced by the previous
+ * packet.
+ */
+ head = etm->instructions_sample_period -
+ (tidq->period_instructions - instrs_executed);
+
+ if (head) {
+ offset = head;
+
+ /*
+ * Calculate the address of the sampled instruction (-1
+ * as sample is reported as though instruction has just
+ * been executed, but PC has not advanced to next
+ * instruction)
+ */
+ addr = cs_etm__instr_addr(etmq, trace_chan_id,
+ tidq->packet, offset - 1);
+ ret = cs_etm__synth_instruction_sample(
+ etmq, tidq, addr,
+ etm->instructions_sample_period);
+ if (ret)
+ return ret;
+
+ instrs_over -= head;
+ }
+
+ while (instrs_over >= etm->instructions_sample_period) {
+ offset += etm->instructions_sample_period;
+
+ /*
+ * Calculate the address of the sampled instruction (-1
+ * as sample is reported as though instruction has just
+ * been executed, but PC has not advanced to next
+ * instruction)
+ */
+ addr = cs_etm__instr_addr(etmq, trace_chan_id,
+ tidq->packet, offset - 1);
+ ret = cs_etm__synth_instruction_sample(
+ etmq, tidq, addr,
+ etm->instructions_sample_period);
+ if (ret)
+ return ret;
+
+ instrs_over -= etm->instructions_sample_period;
+ }
/* Carry remaining instructions into next sample period */
tidq->period_instructions = instrs_over;
--
2.17.1
I believe the following change would work and make for easier reading...
.... at the start of the function remove instrs_executed and replace ....
/* get instructions remainder from previous packet */
u64 instrs_prev = tidq->period_instructions;
/* set available instructions to previous packet remainder + the
current packet count */
tidq->period_instructions += tidq->packet->instr_count;
.... within the if(etm->sample_instructions && ...) statement I would
be more explicit what the elements of the diagram are ....
/*
* Below diagram demonstrates the instruction samples
* generation flows:
*
* Instrs Instrs Instrs Instrs
* Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
* | | | |
* V V V V
* --------------------------------------------------
* ^ ^
* | |
* Period Period
* instructions(Pi) instructions(Pi')
*
* | |
* \---------------- -----------------/
* V
* tidq->packet->instr_count;
*
* Instrs Sample(n...) are the synthesised samples occuring every
etm->instructions_sample_period
* instructions - as defined on the perf command line. Sample(n) being
the last sample before the
* current etm packet, n+1 to n+3 samples generated from the current etm packet.
*
* tidq->packet->instr_count represents the number of instructions in
the current etm packet.
*
* Period instructions (Pi) contains the the number of instructions
executed after the sample point(n)
* from the previous etm packet. This will always be less than
etm->instructions_sample_period.
*
.... continue with explanation here ....
.... then we can simplify the loop code removing some of the temporary
variables ....
/* get the initial offset into the current packet instructions
(entry conditions ensure that instrs_prev < etm->instructions_sample_period)
*/
u64 offset = etm->instructions_sample_period - instrs_prev;
u64 addr;
/* Prepare last branches for instruction sample */
if (etm->synth_opts.last_branch)
cs_etm__copy_last_branch_rb(etmq, tidq);
while (tidq->period_instructions >= etm->instructions_sample_period) {
/*
* Calculate the address of the sampled instruction (-1
* as sample is reported as though instruction has just
* been executed, but PC has not advanced to next
* instruction)
*/
addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1);
ret = cs_etm__synth_instruction_sample( etmq, tidq, addr,
etm->instructions_sample_period);
if (ret)
return ret;
offset += etm->instructions_sample_period;
tidq->period_instructions -= etm->instructions_sample_period;
}
.....
I believe the above should work, but cannot claim to have tried it
out. What do you think?
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel