Re: Multicast packet loss
From: Wes Chow <hidden>
Date: 2009-02-02 19:55:04
(I'm Kenny's colleague, and I've been doing the kernel builds) First I'd like to note that there were a lot of bnx2 NAPI changes between 2.6.21 and 2.6.22. As a reminder, 2.6.21 shows tiny amounts of packet loss, whereas loss in 2.6.22 is significant. Second, some CPU affinity info: if I do like Eric and pin all of the apps onto a single CPU, I see no packet loss. Also, I do *not* see ksoftirqd show up on top at all! If I pin half the processes on one CPU and the other half on another CPU, one ksoftirqd processes shows up in top and completely pegs one CPU. My packet loss in that case is significant (25%). Now, the strange case: if I pin 3 processes to one CPU and 1 process to another, I get about 25% packet loss and ksoftirqd pins one CPU. However, one of the apps takes significantly less CPU than the others, and all apps lose the *exact same number of packets*. In all other situations where we see packet loss, the actual number lost per application instance appears random. We're about to plug in an Intel ethernet card into this machine to collect more rigorous testing data. Please note, though, that we have seen packet loss with a tg3 chipset as well. For now, though, I'm assuming that this is purely a bnx2 problem. If I understand correctly, when the nic signals a hardware interrupt, the kernel grabs it and defers the meaty work to the softirq handler -- how does it decide which ksoftirqd gets the interrupts? Is this something determined by how the driver implements the NAPI? Wes