Re: [RFC] skb align patch
From: Stephen Hemminger <hidden>
Date: 2009-09-22 05:23:57
On Tue, 22 Sep 2009 05:20:53 +0200 Eric Dumazet [off-list ref] wrote:
Stephen Hemminger a écrit :quoted
On Mon, 21 Sep 2009 08:13:20 +0200 Eric Dumazet [off-list ref] wrote:quoted
Stephen Hemminger a écrit :quoted
Based on the Intel suggestion that PCI-express overhead is a significant cost. Would people doing performance please measure the impact of changing SKB alignment (64 bit only).I had this idea some time ago when I hit a limit on bnx2 adapter (Giga bit link, BCM5708S), with small packets. pktgen was able to send ~500 Mbps 'only', or 700kps if I remember well. So I tried to align the pktgen build packet to a cache line, it gave no difference at all, but it was on a 32 bit kernel. (Thus my patch was for pktgen only, not a generic one as yours) Could you elaborate why this change could be useful on 64bit ?It is useful on all architecture where unaligned CPU access is relatively cheap. The issue is that a unaligned DMA requires a read/modify/write cache line access versus just a write access. I am not a bus expert, but writes are probably more pipelined as well.Oh I see, you want to optimize the rx (NIC has to do a DMA to write packet into host memory and this DMA could be a read /modify/write if address is not aligned, instead of a pure write), while I tried to align skb to optimize the pktgen tx (NIC has to do a DMA to read packet from host), and align the skb had no effect. Maybe we should separate the rx/tx, and try your idea only for skb allocated for rx. Also/Or we might try __builtin_prefetch (addr, 0, 0); to instruct cpu to commit to memory cache lines that are going to be modified by NIC.
Don't think it matters whether RX buffer has to read/modify/write from cpu cache or memory on modern cache snooping architecures. The cost is the PCI traffic. --