Thread (69 messages) 69 messages, 10 authors, 2009-04-08

Re: Multicast packet loss

From: Eric Dumazet <hidden>
Date: 2009-01-30 19:04:22

Kenny Chang a écrit :
Hi all,

We've been having some issues with multicast packet loss, we were wondering
if anyone knows anything about the behavior we're seeing.

Background: we use multicast messaging with lots of messages per sec for
our
work. We recently transitioned many of our systems from an Ubuntu Dapper
Drake
ia32 distribution to Ubuntu Hardy Heron x86_64. Since the transition, we've
noticed much more multicast packet loss, and we think it's related to the
transition. Our particular theory is that it's specifically a 32 vs 64-bit
issue.

We narrowed the problem down to the attached program (mcasttest.cc).  Run
"mcasttest server" on one machine -- it'll send 500,000 messages small
message
to a multicast group, 50,000 messages per second.  If we run "mcasttest
client"
on another machine, it'll receive all those messages and print a count
at the
end of how many messages it sees. It almost never loses any messages.
However,
if we run 4 copies of the client on the same machine, receiving the same
data,
then the programs usually sees fewer than 500,000 messages. We're
running with:

for i in $(seq 1 4); do (./mcasttest client &); done

We know this because the program prints a count, but dropped packets also
show up in ifconfig's "RX packets" section.

Things we're curious about: do other people see similar problems?  The
tests
we've done: we've tried this program on a bunch of different machines,
all of
which are running either dapper ia32 or hardy x86_64. Uniformly, the dapper
machines have no problems but on certain machines, Hardy shows
significant loss. We did some experiments on a troubled machine, varying
the OS install, including mixed installations where the kernel was
64-bit and the userspace was
32-bit. This is what we found:

On machines that exhibit this problem, the ksoftirqd process seems to be
pegged to 100% CPU when receiving packets.

Note: while we're on Ubuntu, we've tried this with other distros and
have seen
similar results, we just haven't tabulated them.
quoted
----------------------------------------------------------------------------

userland | userland arch | kernel           | kernel arch |
mode          
----------------------------------------------------------------------------

Dapper   |            32 | 2.6.15-28-server |          32 | no packet
loss
Dapper   |            32 | 2.6.22-generic   |          32 | no packet
loss Dapper   |            32 | 2.6.22-server    |          32 | no
packet loss Hardy    |            32 | 2.6.24-rt        |          32
| no packet loss
Hardy    |            32 | 2.6.24-generic   |          32 | ~5% packet
loss
Hardy    |            32 | 2.6.24-server    |          32 | ~10%
packet loss
quoted
Hardy    |            32 | 2.6.22-server    |          64 | no packet
loss
Hardy    |            32 | 2.6.24-rt        |          64 | no packet
loss
Hardy    |            32 | 2.6.24-generic   |          64 | 14% packet
loss
Hardy    |            32 | 2.6.24-server    |          64 | 12% packet
loss
quoted
Hardy    |            64 | 2.6.22-vanilla   |          64 | packet loss
Hardy    |            64 | 2.6.24-rt        |          64 | ~5% packet
loss
Hardy    |            64 | 2.6.24-server    |          64 | ~30%
packet loss
Hardy    |            64 | 2.6.24-generic   |          64 | ~5% packet
loss
----------------------------------------------------------------------------
It's not exactly clear what exactly the problem is but dapper shows no
issues regardless of what we try. For hardy, userspace seem to matter:
2.6.24-rt kernel shows no packet loss for 32&64bit kernels, as long as
the userspace is 32-bit.

Kernel comments:
2.6.15-28-server: This is Ubuntu Dapper's stock kernel build.
2.6.24-*: This is Ubuntu Hardy's stock kernel.
2.6.22-{generic,server}: This is a custom, in-house kernel build, built
for ia32.
2.6.22-vanilla: This is our custom, in-house kernel build, built for
x86_64.

We don't think it's related to our custom kernels, because the same
phenomena
show up with the Ubuntu stock kernels.

Hardware:

The benchmark machine We've been using is an Intel Xeon E5440 @2.83GHz
dual-cpu quad-core with Broadcom NetXtreme II BCM5708 bnx2 networking.

We've also tried AMD machines, as well as machines with Tigon3
partno(BCM95704A6) tg3 network cards, they all show consistent behavior.

Our hardy x86_64 server machines all appear to have this problem, new
and old.

On the other hand, a desktop with Intel Q6600 quad core 2.4GHz and Intel
82566DC GigE
seem to work fine.

All of the dapper ia32 machines have no trouble, even our older hardware.
Hi Kenny

Interesting... You forgot the mcasttest.cc program

Any chance you try a recent kernel (2.6.29-rcX) ?

Could you post "cat /proc/interrupts" results (one for working
 setup, another for non working/droping setup)

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help