Thread (20 messages) 20 messages, 5 authors, 2018-02-02

Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

From: Alexander Duyck <hidden>
Date: 2018-01-19 16:23:04
Also in: intel-wired-lan, lkml

On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier
[off-list ref] wrote:
On 2018/01/19 17:59, Benjamin Poirier wrote:
quoted
On 2018/01/18 07:51, Alexander Duyck wrote:
quoted
On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier [off-list ref] wrote:
quoted
It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
Yes. The numeric value is correct. I made a mistake when writing down
the flag names.
quoted
patch on was that the VMware code was sending _OTHER instead of _LSC
to trigger LSC events. As such in my version of the workaround I just
It's not so deterministic, sadly. In my tests, upon entry into
e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch
I've seen:
icr=0x0
icr=0x3d
      Reserved RXDMT0 Reserved LSC TXDW
icr=0x46
      RXO LSC TXQE
and someone else reported:
icr=0x3c
      Reserved RXDMT0 Reserved LSC
quoted
went through and did the testing if the _RXO bit was set, otherwise I
assumed that whatever event was received must have been meant to
trigger an _LSC type event since that worked in the past.
                               ^...

Re-reading that part, my thoughts are that it worked in the past, before
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1),
because (presumably) Other was not set in EIAC. It worked after
16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun
interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the
value of icr. If it had, it would've found a bogus value, which is
what's happening after 4aea7a5c5e94. I wonder if we're not just getting
some uninitialized value from the emulation code... which makes me kind
of uneasy about your approach of trying to make sense of the value.
Maybe Shrikrishna can clarify where the icr value is coming from when
Other is set in EIAC.
For now I still say we let my current patch go as-is in order to
address the immediate issue. We can follow-up and do a more refined
version of things once we get the final word from VMware on how this
actually works. If nothing else the current patch appears to resolve
the currently reported issue and is already submitted.

I'm of the mind that we need to cut down on the code thrash.  This
driver is supposed to have been in a "maintenance" mode for the last
year or so as there aren't being any new parts added is my
understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
accepted in the first place. I don't see any notes about it fixing any
bug or addressing any issue and it seems like that is the start of all
the issues we have been having recently with RXO triggering more
interrupts, various link issues, and this most recent VMware issue.

- Alex
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help