Thread (61 messages) 61 messages, 8 authors, 2021-07-16

Re: Realtek 8139 problem on 486.

From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-06-20 10:22:14

On Sun, Jun 20, 2021 at 2:37 AM Thomas Gleixner [off-list ref] wrote:
On Fri, Jun 04 2021 at 09:36, Arnd Bergmann wrote:
quoted
There is a realistic chance that other combinations of broken hardware
and drivers rely on the x86 PCI code doing exactly what it does today.
Legacy PCI interrupts are specified as being level triggered because
legacy PCI requires to share interrupts and you cannot share edge
triggered interrupts reliably. PCI got a lot of things wrong, but this
part is completely correct.

From experience I know that 8139 depends on that and I debugged that
myself many years ago on an ARM system which had the trigger type
configuration wrong.

Let me clarify why this breaks first. The interesting things to look at
are:

  - the device internal interrupt condition which in this case are
    package received or transmitted and status change

  - the interrupt line

Let's look at the timing diagram of simple RX events:

           ___________________               _____...
RX     ___|                   |_____________|

            ___________________               ____...
INTA   ____|                   |_____________|

              _ INT handler _____ RETI _        __...
Kernel ______|                          |______|

Trivial case which works with both edge and level. Now let me add TX
complete for the level triggered case:

           ___________________
RX     ___|                   |_____________

                             _______________
TX     _____________________|

            ________________________________
INTA   ____|

              _ INT handle RX ___ RETI_   __ INT handle TX
Kernel ______|                         |_|

Works as expected. Now the same with edge mode on the i8259 side:

           ___________________
RX     ___|                   |_____________

                             _______________
TX     _____________________|

            ________________________________
INTA   ____|

              _ INT handle RX ___ RETI_
Kernel ______|                         |_____ <- FAIL


Edge mode loses the TX interrupt when it arrives before the RX condition
is cleared because INTA will not go low.
Right, that is what I concluded as well.
The driver cannot do anything about it reliably because all of this is
asynchronous. The loop approach is a bandaid and kinda works, but can it
work reliably? No. Aside of that it's curing the symptom which is the
wrong approach.

The only reasonable solution to this is to enforce level even if that
BIOS switch says otherwise.
I think the hack that Nikolai implemented based on my suggestion
should work around this reliably though: All events (RX, TX, and others)
are now masked in the hardirq handler but processed and acked inside
the napi poll function. After it has processed all events, the device unmasks
all events, so if a new event is already pending at this time, the IRQ line
gets raised and will correctly be processed in both edge and level
mode.

Am I missing something here?

Obviously it's just a hack and it's still going to fail if the edge interrupt
is actually shared with another device, but I don't see any downside
in changing the driver that way.
quoted
If overriding the BIOS setting breaks something that works today, nothing
is gained, because the next person running into an i486 PCI specific bug
is unlikely to be as persistent and competent as Nikolai in tracking down
the root cause.
Yes, sigh. The reason why this BIOS switch exists, which should have
never existed, is that during the transition from EISA which was edge
triggered to PCI some card manufacturers just changed the bus interface
of their cards but completely missed the edge -> level change in
hardware either by stupidity or intentionally so they did not have to
make any changes to the rest of the hardware and to drivers.

The predominant OS'es at that time of course did not have an easy way to
add a quirk to fix this, so the way out was to add the chicken bit
option to the BIOS which then either tells the OS the trigger mode for
INTA..INTD and just sets up ELCR before booting the OS. I have faint
memories that I had to use such a BIOS switch long time ago to get some
add-on PCI card working.

So the right thing to do is to leave 8139 as is and add a PCI quirk
which enforces level trigger type for 8139 in legacy PCI interrupt mode.
So you would prefer a quirk specific to this PCI device on misconfigured
machines? That would at least completely avoid the risk of breaking setups
that rely on an intentional misconfiguration, since it only changes setups
that are definitely not working.

On the other hand, it wouldn't help with non-x86 machines that might
have the same issue, or on x86 machines with a different PCI card
in a misconfigured slot, unless there we maintain a list of PCI
IDs that are known to not work with a broken BIOS configuration.
Alternatively we can just emit a noisy warning when a legacy PCI
interrupt is configured as edge by the BIOS and tell people to toggle
that switch if stuff does not work. Though that might be futile because
not all BIOSes have these toggle options.
That sounds like a good idea regardless of any other outcome.

       Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help