Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
From: Alexander Stein <hidden>
Date: 2016-09-26 11:48:51
On Friday 23 September 2016 11:40:46, Koehrer Mathias wrote:
Hi Sebastian,quoted
thanks for the feedback.quoted
quoted
I run the cyclictest with the following options: # cyclictest -a -i 100 -d 10 -m -n -t -p 80there is -S. And then 100 might be a little tight.quoted
Of course the 2 minutes run-time of cyclictest is only a rough first estimate.quoted
and with no load…quoted
Once I configure one of the i350 ports # ifconfig eth2 up 192.168.100.100 the cyclictest shows directly and reproducibly significant larger max latency values (40 microseconds, using the sameconditions).quoted
I did the very same test with kernel version 3.18.27-rt27. With that version I did not see anything like that. Also, only the igb driver seems to cause the trouble. I have also an e1000e based NIC in this PC and the usage of this driver does not add anysignificant latency.quoted
Any idea on this?Does this also happen if you have the NIC up and you plug in / out the cable? There are two things that come to mind: https://lkml.kernel.org/r/1445465268-10347-1-git-send-email-> > > jonathan.david@ni.com https://lkml.kernel.org/r/1445886895-3692-1-git-send-email-joshc@ni.co mThis happens even if I have done "ifconfig up" on the NIC without having a cable
plugged in.
quoted
Also, it happens if I have a cable plugged in and the link is up but no traffic is running
via this NIC port.
quoted
It looks as if solely the configured NIC port is causing the additional latency, no
matter if traffic is flowing via this NIC or not and no
quoted
matter if the link is up or not. I did the same test with the kernel/rt_preempt patch versions 4.1.33-rt37 and 4.4.19-rt27, they show the very same behavior. In opposite to that, the version 3.18.27-rt27 is working stable! As mentioned before, the "igb" driver is causing the issue. The "e1000e" driver works
fine.
quoted
I did some further analysis. The code that is causing the long latencies seems to be the function "igb_watchdog_task" within igb_main.c (Line: 4386). This function will be called periodically. When I do a return at the beginning of this function the additional latency is not seen.
In particular that function calls "igb_has_link" which seems
to be one candidate that is causing additional latency. Do you have any clue how this code can be executed properly without causing the
additional latencies? IMHO something in igb_watchdog_task causes the latency independently from actual link. At first glance I would suspect igb_update_stats (called with spinlock held) as it seems to do a lot of reads. Maybe this stalls somehow. Does the latency still occur if you comment that spinlock and call to igb_update_stats out? Best regards, Alexander