Thread (25 messages) 25 messages, 2 authors, 2011-08-15

Re: [PATCH 3/6] perf: add reference time event

From: David Ahern <hidden>
Date: 2011-08-04 15:10:48
Also in: lkml

On 07/12/2011 08:30 AM, Frederic Weisbecker wrote:
On Sun, Jul 10, 2011 at 10:20:29PM -0600, David Ahern wrote:
quoted
On 06/17/2011 08:17 AM, Frederic Weisbecker wrote:
quoted
On Fri, Jun 17, 2011 at 08:04:59AM -0600, David Ahern wrote:
quoted

On 06/17/2011 07:32 AM, Frederic Weisbecker wrote:
quoted
On Tue, Jun 07, 2011 at 05:55:46PM -0600, David Ahern wrote:
quoted
For initial perf_clock to time-of-day correlation.

Signed-off-by: David Ahern <redacted>
---
 tools/perf/util/event.c   |    1 +
 tools/perf/util/event.h   |    8 ++++++++
 tools/perf/util/session.c |    4 ++++
 tools/perf/util/session.h |    3 ++-
 4 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 3c1b8a6..1a89a04 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -24,6 +24,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
 	[PERF_RECORD_HEADER_BUILD_ID]		= "BUILD_ID",
 	[PERF_RECORD_FINISHED_ROUND]		= "FINISHED_ROUND",
+	[PERF_RECORD_REFTIME]			= "REF_TIME",
 };
 
 const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 1d7f664..f481f90 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -98,6 +98,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_HEADER_TRACING_DATA		= 66,
 	PERF_RECORD_HEADER_BUILD_ID		= 67,
 	PERF_RECORD_FINISHED_ROUND		= 68,
+	PERF_RECORD_REFTIME			= 69,
We would like to avoid adding more custom events like these. They were very convenient
but they steal the kernel event type space. They are deemed for removal in the long term.

Another idea to achieve what you want would be to create a new perf event header feature,
like HEADER_TRACE_INFO or HEADER_BUILD_ID are. Then use that to create a space in the perf
file to save that couple of clocks initial values.
you mean like this:
https://lkml.org/lkml/2010/12/7/813

David
Exactly, why did you change?
Finally getting back to this.

The answer to the 'why' is that putting a reference timestamp in the
header field does not work for file appends across reboots. ie., the case:
perf record --tod ...
reboot
perf record -A --tod ...
Damn append mode. I doubt that thing is really used. And it just complexifies
everything. It might be wise to get rid of it?

Ingo, Peter, Arnaldo?
 
quoted
perf_clock timestamps change across reboots so the reference time
created by the first invocation is not valid for the append case. The
discussion then drifted towards having a kernel side event which per
past patch sets has its own issues.

So to summarize the options proposed to date and issues with the proposals:
1. reference timestamp in header
   - does not work for appends across reboots

2. synthesized events
   - preference against them

3. kernel side event
   - cannot generate an initial sample (with counter value and
perf_clock timestamp) on demand - e.g., start of session; a proposal to
use an ioctl to add one to the event stream was shot down

At this point the only idea that comes to mind is to use a combination
of 2 and 3: add the kernel side clock event
(https://lkml.org/lkml/2011/2/18/11), read the realtime clock counter,
read the monotonic clock timestamp (ie., perf_clock value), and
synthesize a perf sample that is written to the file. The append case
(with mismatch in --tod options between record invocations) would be
handled by having the kernel side clock event in the event list
(perf_evlist__equal would fail if --tod was not used for all invocations).
Actually you first have to face a deeper problem. events are not stored
in order in the flow, but they are sorted from perf_session__process_events().

The bunch of sorted events is flushed periodically and sent to the consumer.

See flush_sample_queue().

And this sorting is made on top of the sample->time timestamps. So events
are first sorted on sample->time and only afterward you have access to your
gtod tracepoint samples. But if that gtod sample has been taken after a reboot
then its sample->time is not consistant with the rest. It is not well sorted
and thus the reftime won't be updated at the right moment.

So the problem is that reftime update already depends on a consistant cpu
timestamp.

I can't think about a sane way to work around that. Sorting on gtod + cpu timestamp
is not a solution because gtod can change.

I'd rather propose to refuse append mode as long as we have any timestamp. That includes
gtod but also sample timestamps. They are buggy if we reboot.
Arnaldo's sending patches, so I take it he's dug out from backlog. ;-)

Any objections to not allowing append mode for perf-record if samples
contain timestamps?

David
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help