Thread (9 messages) 9 messages, 4 authors, 2009-07-29

Re: [RFC] Idea about increasing efficency of skb allocation in network devices

From: Eric Dumazet <hidden>
Date: 2009-07-27 07:58:36

Brice Goglin a écrit :
David Miller wrote:
quoted
From: Neil Horman <nhorman@tuxdriver.com>
Date: Sun, 26 Jul 2009 20:36:09 -0400

  
quoted
	Since Network devices dma their memory into a provided DMA
buffer (which can usually be at an arbitrary location, as they must
cross potentially several pci busses to reach any memory location),
I'm postulating that it would increase our receive path efficiency
to provide a hint to the driver layer as to which node to allocate
an skb data buffer on.  This hint would be determined by a feedback
mechanism.  I was thinking that we could provide a callback function
via the skb, that accepted the skb and the originating net_device.
This callback can track statistics on which numa nodes consume
(read: copy data from) skbs that were produced by specific net
devices.  Then, when in the future that netdevice allocates a new
skb (perhaps via netdev_alloc_skb), we can use that statistical
profile to determine if the data buffer should be allocated on the
local node, or on a remote node instead.
    
No matter what, you will do an inter-node memory operation.

Unless, the consumer NUMA node is the same as the one the
device is on.

Because since the device is on a NUMA node, if you DMA remotely
you've eaten the NUMA cost already.

If you always DMA to the device's NUMA node (what we try to do now) at
least the is the possibility of eliminating cross-NUMA traffic.

Better to move the application or stack processing towards the NUMA
node the network device is on, I think.
  
Is there an easy way to get this NUMA node from the application socket
descriptor?
Thats not easy, this information can change for every packet (think of
bonding setups, whith aggregation of devices on different NUMA nodes)

We could add a getsockopt() call to peek this information from the next
data to be read from socket (returns node id where skb data is sitting,
hoping that NIC driver hadnt copybreak it (ie : allocate a small skb and
copy the device provided data on it before feeding packet to network stack))

Also, one question that was raised at the Linux Symposium is: how do you
know which processors run the receive queue for a specific connection ?
It would be nice to have a way to retrieve such information in the
application to avoid inter-node and inter-core/cache traffic.
All this depends on the fact you have multiqueue devices or not, and
trafic spreads on all queues or not.

Assuming you have single queue device, only current way to handle
this is to do the reverse thinking.

Ie, bind NIC interrupts to the appropriate set of cpus, and
possibly bind user apps threads dealing with network trafic to same set.

Only background or cpu hungry threads should be allowed to run
on foreigns nodes.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help