Thread (61 messages) 61 messages, 8 authors, 2021-07-16

Re: Realtek 8139 problem on 486.

From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-06-12 22:49:10

On Sat, Jun 12, 2021 at 7:40 PM Nikolai Zhubr [off-list ref] wrote:
09.06.2021 10:09, Arnd Bergmann:
[...]
quoted
If it's only a bit slower, that is not surprising, I'd expect it to
use fewer CPU
cycles though, as it avoids the expensive polling.

There are a couple of things you could do to make it faster without reducing
reliability, but I wouldn't recommend major surgery on this driver, I was just
going for the simplest change that would make it work right with broken
IRQ settings.

You could play around a little with the order in which you process events:
doing RX first would help free up buffer space in the card earlier, possibly
alternating between TX and RX one buffer at a time, or processing both
in a loop until the budget runs out would also help.
I've modified your patch so as to quickly test several approaches within
a single file by just switching some conditional defines.
My diff against 4.14 is here:
https://pastebin.com/mgpLPciE

The tests were performed using a simple shell script:
https://pastebin.com/Vfr8JC3X

Each cell in the resulting table shows:
- tcp sender/receiver (Mbit/s) as reported by iperf3 (total)
- udp sender/receiver (Mbit/s) as reported by iperf3 (total)
- accumulated cpu utilization during tcp+upd test.

The first line in the table essentially corresponds to a standard
unmodified kernel. The second line corresponds to your initially
proposed approach.

All tests run with the same physical instance of 8139D card against the
same server.

(The table best viewed in monospace font)
+-------------------+-------------+-----------+-----------+
| #Defines          ; i486dx2/66  ; Pentium3/ ; PentiumE/ |
|                   ; (Edge IRQ)  ;  1200     ; Dual 2600 |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 1  ;             ; tcp 86/86 ; tcp 94/94 |
| TX_WORK_IN_POLL 0 ;  (fails)    ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 0     ;             ; cpu 59%   ; cpu 15%   |
| LOOP_IN_POLL 0    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 0  ; tcp 9.4/9.1 ; tcp 88/88 ; tcp 95/94 |
| TX_WORK_IN_POLL 1 ; udp 5.5/5.5 ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 0     ; cpu 98%     ; cpu 55%   ; cpu 19%   |
| LOOP_IN_POLL 0    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 0  ; tcp 9.0/8.7 ; tcp 87/87 ; tcp 95/94 |
| TX_WORK_IN_POLL 1 ; udp 5.8/5.8 ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 0     ; cpu 98%     ; cpu 58%   ; cpu 20%   |
| LOOP_IN_POLL 1    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 1  ; tcp 7.3/7.3 ; tcp 87/86 ; tcp 94/94 |
| TX_WORK_IN_POLL 0 ; udp 6.2/6.2 ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 1     ; cpu 99%     ; cpu 57%   ; cpu 17%   |
| LOOP_IN_POLL 0    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 1  ; tcp 6.5/6.5 ; tcp 88/88 ; tcp 94/94 |
| TX_WORK_IN_POLL 1 ; udp 6.1/6.1 ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 1     ; cpu 99%     ; cpu 55%   ; cpu 16%   |
| LOOP_IN_POLL 1    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+
| TX_WORK_IN_IRQ 1  ; tcp 5.7/5.7 ; tcp 87/87 ; tcp 95/94 |
| TX_WORK_IN_POLL 1 ; udp 6.1/6.1 ; udp 96/96 ; udp 96/96 |
| LOOP_IN_IRQ 1     ; cpu 98%     ; cpu 56%   ; cpu 15%   |
| LOOP_IN_POLL 0    ;             ;           ;           |
+-------------------+-------------+-----------+-----------+

Hopefully this helps to choose the most benefical approach.
I think several variants can just be eliminated without looking
at the numbers:

- doing the TX work in the irq handler (with the loop) but not in
  the poll function is incorrect with the edge interupts, as it has
  the same race as before, you just make it much harder to hit

- doing the tx work in both the irq handler and the poll function
  is probably not helpful, you just do extra work

- calling the tx cleanup loop in a second loop is not helpful
  if you don't do anything interesting after finding that all
  TX frames are done.

For best performance I would suggest restructuring the poll
function from your current

  while (boguscnt--) {
       handle_rare_events();
       while (tx_pending())
             handle_one_tx();
  }
  while (rx_pending && work_done < budged)
         work_done += handle_one_rx();

to something like

   handle_rare_events();
   do {
      if (rx_pending())
          work_done += handle_one_rx();
      if (tx_pending())
          work_done += handle_one_tx();
   } while ((tx_pending || rx_pending) && work_done < budget)

This way, you can catch the most events in one poll function
if new work comes in while you are processing the pending
events.

Or, to keep the change simpler, keep the inner loop in the tx
and rx processing, doing all rx events before moving on
to processing all tx events, but then looping back to try both
again, until either the budget runs out or no further events
are pending.

      Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help