Thread (59 messages) 59 messages, 12 authors, 2016-02-02

Re: Optimizing instruction-cache, more packets at each stage

From: Eric Dumazet <hidden>
Date: 2016-01-20 23:02:27

On Thu, 2016-01-21 at 00:20 +0200, Or Gerlitz wrote:
Dave, I assume you refer to the RSS hash result which is written by
NIC HWs to the completion descriptor and then fed to the stack by the
driver calling skb_set_hash(.)? Well, this can be taken even further.

Suppose a the NIC can be programmed by the kernel to provide a unique
flow tag on the completion descriptor per a given 5/12 tuple which
represents a TCP (or other logical) stream a higher level in the stack
is identifying to be in progress, and the driver plants that in
skb->mark before calling into the stack.

I guess this could yield nice speed up for the GRO stack -- matching
based on single 32 bit value instead of per protocol (eth, vlan, ip,
tcp) checks [1] - or hint which packets from the current window of
"ready" completion descriptor could be grouped together for upper
processing?
We already use the RSS hash (skb->hash) in GRO engine to speedup the
parsing : If skb->hash differs, then there is no point trying to
aggregate two packets.

Note that if we had a l4 hash for all provided packets, GRO could use a
hash table instead of one single list of skbs.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help