Thread (36 messages) 36 messages, 8 authors, 2008-03-29

Re: 2.6.24 BUG: soft lockup - CPU#X

From: jamal <hidden>
Date: 2008-03-28 10:33:24

On Thu, 2008-27-03 at 18:58 -0700, Matheos Worku wrote:
In general, while the TX serialization  improves performance in terms to 
lock contention, wouldn't it reduce throughput since only one guy is 
doing the actual TX at any given time.  Wondering if it would be 
worthwhile to have an  enable/disable option specially for multi queue TX.
Empirical evidence so far says at some point the bottleneck is going to
be the wire i.e modern CPUs are "fast enough" that sooner than later
they will fill up the DMA ring of transmitting driver and go back to
doing other things. 
It is hard to create the condition you seem to have come across. I had
access to a dual core opteron but found it very hard with parallel UDP
sessions to keep the TX CPU locked in that region (while the other 3
were busy pumping packets). My folly could have been that i had a Gige
wire and maybe a 10G would have recreated the condition. 
If you can reproduce this at will, can you try to reduce the number of
sending TX u/iperfs and see when it begins to happen?
Are all the iperfs destined out of the same netdevice?

[Typically the TX path on the driver side is inefficient either because
of coding (ex: unnecessary locks) or expensive IO. But this has not
mattered much thus far (given fast enough CPUs).
It all could be improved by reducing the per packet operations the
driver incurs -  as an example, the CPU (to the driver) could batch a
set of packet to the device then kick the device DMA once for the batch
etc.]

cheers,
jamal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help