Thread (50 messages) 50 messages, 4 authors, 2021-06-15

Re: [PATCH V3 9/9] tracing: Add timerlat tracer

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2021-06-11 20:11:35
Also in: lkml

On Fri, 11 Jun 2021 14:59:13 +0200
Daniel Bristot de Oliveira [off-list ref] wrote:
------------------ %< -----------------------------
It is worth mentioning that the *duration* values reported
by the osnoise: events are *net* values. For example, the
thread_noise does not include the duration of the overhead caused
by the IRQ execution (which indeed accounted for 12736 ns). But
the values reported by the timerlat tracer (timerlat_latency)
are *gross* values.

The art below illustrates a CPU timeline and how the timerlat tracer
observes it at the top and the osnoise: events at the bottom. Each "-"
in the timelines means 1 us, and the time moves ==>:

     External          context irq                  context thread
      clock           timer_latency                 timer_latency
      event              18 us                          48 us 
        |                  ^                             ^
        v                  |                             |
        |------------------|                             |       <-- timerlat irq timeline
        |------------------+-----------------------------|       <-- timerlat thread timeline
                           ^                             ^
 ===================== CPU timeline ======================================
                   [timerlat/ irq]  [ dev irq ]                          
 [another thread...^             v..^         v........][timerlat/ thread]  
 ===================== CPU timeline ======================================
                   |-------------|  |---------|                  <-- irq_noise timeline
                                 |--^         v--------|         <-- thread_noise timeline
                                 |            |        |
                                 |            |        + thread_noise: 10 us
                                 |            +-> irq_noise: 9 us
                                 +-> irq_noise: 13 us

 --------------- >% --------------------------------  
That's really busy, and honestly, I can't tell what is what.

The "context irq timer_latency" is a confusing name. Could we just have
that be "timer irq latency"? And "context thread timer_latency" just be
"thread latency". Adding too much text to the name actually makes it harder
to understand. We want to simplify it, not make people have to think harder
to see it.

I think we can get rid of the "<-- .* timeline" to the right.  I don't
think they are necessary. Again, the more you add to the diagram, the
busier it looks, and the harder it is to read.

Could we switch "[timerlat/ irq]" to just "[timer irq]" and explain how
that "context irq timer_latency"/"timer irq latency" is related?

Should probably state that the "dev irq" is an unrelated device interrupt
that happened.

What's with the two CPU timeline lines? Now there I think it would be
better to have the arrow text by itself.

And finally, not sure if you plan on doing this, but have a output of the
trace that would show the above.

Thus, here's what I would expect to see:

      External         
       clock         timer irq latency                 thread latency
       event              18 us                          48 us 
         |                  ^                             ^
         v                  |                             |
         |------------------|                             |
         |------------------+-----------------------------|       
                            ^                             ^
  =========================================================================
                    [timerlat/ irq]  [ dev irq ]                             
  [another thread...^             v..^         v........][timerlat/ thread]  <-- CPU task timeline
  =========================================================================
                    |-------------|  |---------|
                                  |--^         v--------|
                                  |            |        |
                                  |            |        + thread_noise: 10 us
                                  |            +-> irq_noise: 9 us
                                  +-> irq_noise: 13 us
 
 The "[ dev irq ]" above is an interrupt from some device on the system that
 causes extra noise to the timerlat task.

I think the above may be easier to understand, especially if the trace
output that represents it is below.

Also, I have to ask, shouldn't the "thread noise" really start at the
"External clock event"?

-- Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help