Thread (7 messages) 7 messages, 3 authors, 2004-05-27

Re: [PATCH] fix BUG in tg3_tx

From: Greg Banks <hidden>
Date: 2004-05-26 00:54:29

On Tue, May 25, 2004 at 01:04:24PM -0700, Michael Chan wrote:
quoted
Greg, did you see Micahel Chan's response?  A Broadcom 
engineer is telling us "the hardware does not ACK partial TX packets."
That's right. The hw is designed to always complete tx packets on packet
boundaries and not BD boundaries. The send data completion state machine
will create 1 single dma descriptor and 1 host coalescing descriptor for
the entire packet. All of our drivers do not handle individual BD
completions and I'm not aware of any problems caused by this. Actually
we did see some partial packet completions during the early
implementions of TSO/LSO. But those were firmware issues and have been
fixed long time ago. tg3 is not using those early TSO firmware.
I believe the SGI-branded cards ship with firmware fixes beyond simply
changing the PCI ids.  Also, AFAIK it dates from about the time of the
TSO experiments.  Can you check if that firmware has the issue you
describe?
quoted
I don't argue that you aren't seeing something strange, but 
perhaps that is due to corruption occuring elsewhere, or 
perhaps something peculiar about your system hardware 
(perhaps the PCI controller mis-orders PCI transactions or 
something silly like that)?
Good point. A few years ago we saw cases where there were tx completions
on BDs that had not been sent. It turned out that on that machine, the
chipset was re-ordering the posted mmio writes to the send mailbox
register from 2 CPUs. For example, CPU 1 wrote index 1 and CPU wrote
index 2 a little later. On the PCI bus, we saw memory write of 2
followed by 1. When the chip saw 2, it would send both packets. When it
later saw 1, it thought that there were 512 new tx BDs and went ahead to
send them. The only effective workaround for this chipset problem was a
read of the send mailbox after the write to flush it.
The tg3 driver already does this if the TG3_FLAG_MBOX_WRITE_REORDER
flag is set in tp->tg3_flags.  There's been some discussion inside
SGI about that behaviour.  In short, our PCI hardware is susceptible
to PIO write reordering, but experiment has shown that enabling that
flag results in an unacceptable throughput degradation (about 10%).

I have also noticed that under significant load the softirq portion
of the driver gets scheduled on other CPUs than the interrupt CPU,
including CPUs in other NUMA nodes.

This sounds like a theory I can test.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help