Re: [RFC] netif_rx: receive path optimization
From: jamal <hidden>
Date: 2005-03-31 21:38:04
On Thu, 2005-03-31 at 16:24, Rick Jones wrote:
quoted
The repurcassions of going from per-CPU-for-all-devices queue (introduced by softnet) to per-device-for-all-CPUs maybe huge in my opinion especially in SMP. A closer view of whats there now maybe per-device-per-CPU backlog queue. I think performance will be impacted in all devices. imo, whatever needs to go in needs to have some experimental data to back itIndeed. At the risk of again chewing on my toes (yum), if multiple CPUs are pulling packets from the per-device queue there will be packet reordering.
;-> This happens already _today_ on Linux on non-NAPI. Take the following scenario in non-NAPI. -packet 1 arrives -interupt happens, NIC bound to CPU0 - in the meantime packets 2,3 arrive - 3 packets put on queue for CPU0 - interupt processing done - packet 4 arrives, interupt, CPU1 is bound to NIC - in the meantime packets 5,6 arrive - CPU1 backlog queue used. - interupt processing done Assume CPU0 is overloaded with other systenm work and CPU1 rx processing kicks in first ... TCP sees packet 4, 5, 6 before 1, 2, 3 .. Note Linux is quiet resilient to reordering compared to other OSes (as you may know) but avoiding this is a better approach - hence my suggestion to use NAPI when you want to do serious TCP. Of course NAPI is not all that panacea under low traffic eating a little bit more CPU (but you have CPU issues under low load you are in some other deep shit)
HP-UX 10.0 did just that and it was quite nasty even at low CPU counts (<=4). It was changed by HP-UX 10.20 (ca 1995) to per-CPU queues with queue selection computed from packet headers (hash the IP and TCP/UDP header to pick a CPU) It was called IPS for Inbound Packet Scheduling. 11.0 (ca 1998) later changed that to "find where the connection last ran and queue to that CPU" That was called TOPS - Thread Optimized Packet Scheduling.
Dont think we can do that unfortunately: We are screwed by the APIC architecture on x86. cheers, jamal