Thread (34 messages) 34 messages, 10 authors, 2016-10-18

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

From: Koehrer Mathias (ETAS/ESW5) <hidden>
Date: 2016-10-04 14:33:12

Hi Julia,
Which, looks to me to be the normal "forced primary" interrupt handling path, which
simply wakes the created irqthread.

However, what isn't clear from the data is _which_ irqthread(s) is being woken up.
Presumably, due to the prior igb traces, it's one of the igb interrupts, but that would
be nice to confirm using the sched_wakeup event or other means.

Similar to the PCI write-buffering cases, we've also observed that when the PCI
interconnect is bogged down with traffic from other masters, it's possible that a read
from the CPU can be stalled (in some cases, for quite awhile, but it depends on the
PCI topology, switches used, their configurations, etc).

So, one plausible narrative here: it's conceivable that the
rd32(E1000_RXSTMPH) in igb_ptp_rx_hang() is "stuck" in the bus somewhere
presumably due to load from other masters (the trace seems to indicate it might be
as much as 20us), with CPU execution stalled awaiting it's completion.  Meanwhile,
the CPU is encountering interrupts from other sources (local APIC, etc).  Once the
read "completes", the CPU is finally able to service all of the interrupts that have
piled up, which is why we see in the traces these 9 wakeups happening in a row.

The question is: how can we confirm/refute this, or are there other, more plausible
scenarios it's possible to run into?
Thanks for the proposal. Unfortunately I have no idea on this.

In the meanwhile I have detected another finding which might be relevant:
With the 3.18 kernel the igb driver comes with two interrupts per NIC (e.g. eth2 and eth2-TxRx0)
with the 4.6. kernel the igb driver comes with 9 (!) interrupts per NIC: 
eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.
As I have used initially the same kernel configuration from 3.18 also for the 4.6. kernel I wonder
where this comes from and if there is any kernel option I may use to disable these many
interrupts and to reduce it to 2 again.

Any idea on this is welcome.

Regards

Mathias
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help