Re: [PATCH 2/3] Revert "e1000e: Separate signaling for link check/link up"
From: Alexander Duyck <hidden>
Date: 2018-01-29 15:48:30
Also in:
intel-wired-lan, lkml
On Sun, Jan 28, 2018 at 11:21 PM, Benjamin Poirier [off-list ref] wrote:
On 2018/01/26 09:03, Alexander Duyck wrote:quoted
On Fri, Jan 26, 2018 at 1:12 AM, Benjamin Poirier [off-list ref] wrote:quoted
This reverts commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013. This reverts commit 4110e02eb45ea447ec6f5459c9934de0a273fb91. ... because they cause an extra 2s delay for the link to come up when autoneg is off. After reverting, the race condition described in the log of commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up") is reintroduced. It may still be triggered by LSC events but this should not result in link flap. It may no longer be triggered by RXO events because commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") restored reading icr in the Other handler.With the RXO events removed the only cause for us to transition the bit should be LSC. I'm not sure if the race condition in that state is a valid concern or not as the LSC should only get triggered if the link state toggled, even briefly. The bigger concern I would have would be the opposite of the original race that was pointed out: \ e1000_watchdog_task \ e1000e_has_link \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link /* link is up */ mac->get_link_status = false; /* interrupt */ \ e1000_msix_other hw->mac.get_link_status = true; link_active = !hw->mac.get_link_status /* link_active is false, wrongly */
\ e1000_watchdog_task
\ e1000e_has_link
\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
/* interrupt */
\ e1000_msix_other
hw->mac.get_link_status = true;
/* link is up */
mac->get_link_status = false;
link_active = !hw->mac.get_link_status
/* link_active is true, wrongly */
So basically the idea would be that the interrupt fires after we check
for link, but before we have updated get_link_status. It should be a
pretty narrow window and I don't know if it will be much of an issue.
quoted
So the question I would have is what if we see the LSC for a link down just after the check_for_copper_link call completes? It may not beCan you write out exactly what that race would be, in a format similar to the above?
I just did the copy/paste/edit above if that works. Hopefully that makes it clearer.
quoted
anything seen in the real world since I don't know if we have any link flapping issues on e1000e or not without this patch. It is something to keep in mind for the future though.quoted
As discussed, the driver should be in "maintenance mode". In the interest of stability, revert to the original code as much as possible instead of a half-baked solution.If nothing else we may want to do a follow-up on this patch as we probably shouldn't be returning the error values to trigger link up. There are definitely issues to be found here. If nothing else we may want to explore just returning 1 if auto-neg is disabled instead of returning an error code.quoted
Link: https://www.spinics.net/lists/netdev/msg479923.html Signed-off-by: Benjamin Poirier <redacted>[...]