Thread (34 messages) 34 messages, 10 authors, 2016-10-18

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

From: Koehrer Mathias (ETAS/ESW5) <hidden>
Date: 2016-10-05 07:02:25

Hi Julia,
quoted
In the meanwhile I have detected another finding which might be relevant:

With the 3.18 kernel the igb driver comes with two interrupts per NIC
(e.g. eth2 and eth2-TxRx0) with the 4.6. kernel the igb driver comes with 9 (!)
interrupts per NIC:
quoted
eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.

As I have used initially the same kernel configuration from 3.18 also
for the 4.6. kernel I wonder where this comes from and if there is any
kernel option I may use to disable these many interrupts and to reduce
it to 2 again.
If it's all of these interrupts that are firing and being handled at the same time, that
can account for the latencies you were seeing.  As I suggested before, having a
trace with the sched_wakeup event enabled can help confirm that it's these
interrupts causing problems.

If it is true, then the question is: why is the device triggering all of these interrupts all
at once?  Is that expected?  These are questions for netdev folks, I think.

   Julia
OK - I ran again the cyclictest. This time I used the -C option:
# cyclictest -a -i 100 -m -n -p 80 -t 1 -b 21 -C

And the last output lines of the trace are:
cyclicte-5887    0d...2.. 1504647266us!: sched_switch: prev_comm=cyclictest prev_pid=5887 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 prio=19 target_cpu=000
kworker/-5       0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR pid=5883 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR pid=5882 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR pid=5881 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR pid=5880 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR pid=5879 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR pid=5878 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR pid=5877 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR pid=5876 prio=49 target_cpu=000
kworker/-5       0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 next_prio=19
cyclicte-5887    0.....11 1504647389us : tracing_mark_write: hit latency threshold (28 > 21)

The attached trace-extract.gz shows some more lines.
It actually looks to me as if the the many irq threads from igb are causing the issue.

Regards

Mathias

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help