Thread (17 messages) 17 messages, 2 authors, 2018-02-28

Re: [PATCH 2/3] Revert "e1000e: Separate signaling for link check/link up"

From: Alexander Duyck <hidden>
Date: 2018-01-29 15:48:30
Also in: intel-wired-lan, lkml

On Sun, Jan 28, 2018 at 11:21 PM, Benjamin Poirier [off-list ref] wrote:
On 2018/01/26 09:03, Alexander Duyck wrote:
quoted
On Fri, Jan 26, 2018 at 1:12 AM, Benjamin Poirier [off-list ref] wrote:
quoted
This reverts commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013.
This reverts commit 4110e02eb45ea447ec6f5459c9934de0a273fb91.

... because they cause an extra 2s delay for the link to come up when
autoneg is off.

After reverting, the race condition described in the log of commit
19110cfbb34d ("e1000e: Separate signaling for link check/link up") is
reintroduced. It may still be triggered by LSC events but this should not
result in link flap. It may no longer be triggered by RXO events because
commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
restored reading icr in the Other handler.
With the RXO events removed the only cause for us to transition the
bit should be LSC. I'm not sure if the race condition in that state is
a valid concern or not as the LSC should only get triggered if the
link state toggled, even briefly.

The bigger concern I would have would be the opposite of the original
race that was pointed out:
    \ e1000_watchdog_task
        \ e1000e_has_link
            \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
                /* link is up */
                mac->get_link_status = false;

                                /* interrupt */
                                \ e1000_msix_other
                                    hw->mac.get_link_status = true;

            link_active = !hw->mac.get_link_status
            /* link_active is false, wrongly */
     \ e1000_watchdog_task
         \ e1000e_has_link
             \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
                                 /* interrupt */
                                 \ e1000_msix_other
                                     hw->mac.get_link_status = true;

                /* link is up */
                 mac->get_link_status = false;

             link_active = !hw->mac.get_link_status
             /* link_active is true, wrongly */

So basically the idea would be that the interrupt fires after we check
for link, but before we have updated get_link_status. It should be a
pretty narrow window and I don't know if it will be much of an issue.
quoted
So the question I would have is what if we see the LSC for a link down
just after the check_for_copper_link call completes? It may not be
Can you write out exactly what that race would be, in a format similar to the
above?
I just did the copy/paste/edit above if that works. Hopefully that
makes it clearer.
quoted
anything seen in the real world since I don't know if we have any link
flapping issues on e1000e or not without this patch. It is something
to keep in mind for the future though.

quoted
As discussed, the driver should be in "maintenance mode". In the interest
of stability, revert to the original code as much as possible instead of a
half-baked solution.
If nothing else we may want to do a follow-up on this patch as we
probably shouldn't be returning the error values to trigger link up.
There are definitely issues to be found here. If nothing else we may
want to explore just returning 1 if auto-neg is disabled instead of
returning an error code.
quoted
Link: https://www.spinics.net/lists/netdev/msg479923.html
Signed-off-by: Benjamin Poirier <redacted>
[...]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help