Thread (69 messages) 69 messages, 10 authors, 2009-04-08

Re: Multicast packet loss

From: Wesley Chow <hidden>
Date: 2009-02-05 13:46:49

quoted

Maybe its time to change user side, and not try to find an  
appropriate kernel :)

If you know you have to receive N frames per 20us units, then its  
better to :
Use non blocking sockets, and doing such loop :

{
usleep(20); // or try to compensate if this thread is slowed too  
much by following code
for (i = 0 ; i < N ; i++) {
	while (revfrom(socket[N], ....) != -1)
		receive_frame(...);
	}
}

That way, you are pretty sure network softirq handler wont have to  
spend time trying
to wakeup 400.000 time per second one thread. All cpu cycles can be  
spent in NIC driver
and network stack.

Your thread will do 50.000 calls to nanosleep() per second, that is  
not really expensive,
then N recvfrom() per iteration. It should work on all past ,  
current and future kernels.
+1 to this idea.  Since the last oprofile traces showed significant  
variance in
the time spent in schedule(), it might be worthwhile to investigate  
the affects
of the application behavior on this.  I might also be worth adding a  
systemtap
probe to sys_recvmsg, to count how many times we receive frames on a  
working and
non-working system.  If the app is behaving differently on different  
kernels,
and its affecting the number of times you go to get a frame out of  
the stack,
that would affect your drop rates, and it would show up in sys_recvmsg

I did some work to our test program to spin on a non-blocking socket  
and it indeed seems to fix the problem, at least for 2.6.28.1, which  
was a kernel we had problems with. The number of context switches  
drastically drops -- from 200,000+ to less than 50!

I haven't done totally comprehensive tests yet, so I don't want to  
officially state any results. I'm also out today, but Kenny may get a  
chance to play with it. Spinning on the socket is looking like an  
interesting solution, but we're a bit nervous about seeing our  
processes constantly running at 100% CPU. Does C++ have a  
MachineOnFire exception?


Wes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help