Re: Realtek 8139 problem on 486.
From: Nikolai Zhubr <hidden>
Date: 2021-06-07 22:58:42
Hi Arnd, 02.06.2021 12:12, Arnd Bergmann: [...]
I think the easiest workaround to address this reliably would be to move all the irq processing into the poll function. This way the interrupt is completely masked in the device until the poll handler finishes, and unmasking it while there are pending events would reliably trigger a new irq regardless of level or edge mode. Something like the untested change at https://pastebin.com/MhBJDt6Z . I don't know of other drivers that do it like this though, so I'm not sure if this causes a different set of problems.
I started applying your patch (trying to morph it a little bit so as to shove in a minimally invasive manner into 4.14) and then noticed that it probably won't work as intended. If I'm not mistaken this rx poll thing is only active within kind of "rx bursts", so it is not guaranteed to be continually running all the time when there is no or little rx input. I'd suppose some new additional work/thread would have to be introduced in order for such approach to be reliably implemented. Meanwhile, beside the lost tx irq issue, I've apparently identified rx overrun issue. According to tinymembench, the raw RAM performance of this system is roughly around 15-30 Mbytes/s at best, so it is close to 100Mbit wire speed. Tracing NFS over UDP operation (client side) I've found that of 2 full-sized incoming NFS/UDP packets the second one will always be lost, approved by rapid increase of iface err counter. More specifically, I've found that a couple of packets sized 1500+700 can still be successfully accepted, but no way 1500+1500. Apparently 8139 has very little ram builtin so it needs that packets can go into main ram fast enough. It appeared though that just adding rsize=1024 allows NFS work quite well, with only ocasional small pauses. Also, apparenly TCP/IP somehow recovers/autotunes iteself automatically, so it just works fine. I suppose this overrun problem can not be fixed in a general form (other than forcing a downgrade of link speed to 10 Mbit), as AFAIK there are no provisions in ethernet to request e.g. extra delays between packets. What might be usefull though is dropping some line to dmesg suggesting to somehow limit the incoming flow. Such hint in dmesg would have saved me quite some time. Anyway, for now I got it working quite well (with a re-added busy loop and rsize=1024). I'm going to look at the elcr_set_level_irq approach later, but it looks quite complicated. If there is something else I can test while at it, please let me know. Thank you, Regards, Nikolai
Arnd