Thread (8 messages) 8 messages, 5 authors, 2009-09-25

Re: [RFC] skb align patch

From: Stephen Hemminger <hidden>
Date: 2009-09-22 05:23:57

On Tue, 22 Sep 2009 05:20:53 +0200
Eric Dumazet [off-list ref] wrote:
Stephen Hemminger a écrit :
quoted
On Mon, 21 Sep 2009 08:13:20 +0200
Eric Dumazet [off-list ref] wrote:
quoted
Stephen Hemminger a écrit :
quoted
Based on the Intel suggestion that PCI-express overhead is
a significant cost.

Would people doing performance please measure the impact of
changing SKB alignment (64 bit only).
I had this idea some time ago when I hit a limit on bnx2 adapter
(Giga bit link, BCM5708S), with small packets. pktgen was able
to send ~500 Mbps 'only', or 700kps if I remember well.
So I tried to align the pktgen build packet to a cache line,
it gave no difference at all, but it was on a 32 bit kernel.
(Thus my patch was for pktgen only, not a generic one as yours)

Could you elaborate why this change could be useful on 64bit ?
It is useful on all architecture where unaligned CPU access is
relatively cheap.

The issue is that a unaligned DMA requires a read/modify/write
cache line access versus just a write access. I am not a bus
expert, but writes are probably more pipelined as well.
Oh I see, you want to optimize the rx (NIC has to do a DMA
to write packet into host memory and this DMA could be a read
/modify/write if address is not aligned, instead of a pure write),
 while I tried to align skb to optimize the pktgen tx 
(NIC has to do a DMA to  read packet from host), and align the skb
had no effect.

Maybe we should separate the rx/tx, and try your idea only
for skb allocated for rx.

Also/Or we might try 
__builtin_prefetch (addr, 0, 0);
to instruct cpu to commit to memory cache lines that are
going to be modified by NIC.
Don't think it matters whether RX buffer has to read/modify/write
from cpu cache or memory on modern cache snooping architecures.
The cost is the PCI traffic.

-- 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help