Re: [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Eric Dumazet <hidden>
Date: 2014-10-31 15:46:06
On Fri, 2014-10-31 at 16:00 +0200, Or Gerlitz wrote:
On Fri, Oct 31, 2014 at 5:19 AM, Eric Dumazet [off-list ref] wrote:quoted
On Fri, 2014-10-31 at 01:25 +0200, Or Gerlitz wrote:quoted
On Thu, Oct 30, 2014 at 9:00 PM, Eric Dumazet [off-list ref] wrote:quoted
On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:quoted
Remove the code which goes through napi_gro_frags() on the RX path, use only napi_gro_receive().quoted
Hmpff... napi_gro_frags() should be faster. Have you benchmarked this ?yep we did, napi_gro_frags() was somehow better for single stream. Do you think we need to do it the other way around, e.g converge to use napi_gro_frags()?quoted
napi_gro_frags() is faster because the napi->skb is reused fast (not going through kfree_skb()/alloc_skb() for every fragment)I see. Is this a strong vote to convert the code to use napi_gro_frags on it's usual track?
I don't know yet. In some cases, actually slowing down the rx path can
help by building bigger GRO packets. But instead of inserting delays,
we can simply force napi to be run another time, with a nanosec based
timer.
I've tested this kind of heuristic :
/* If some packets are waiting in GRO engine and timeout is not expired,
* reschedule a NAPI poll. We allow servicing other softirqs
* before repoll, we do not rearm CQ.
*/
if (rx_nsecs && napi->gro_list && !need_resched()) {
u64 now = local_clock();
unsigned long flags;
/* If we got packets in this round, restart timeout */
if (done)
cq->tstart = now;
else if (now - cq->tstart >= (u64)rx_nsecs)
goto complete;
/* Since we might need one skb very soon, build it now */
napi_get_frags(napi);
local_irq_save(flags);
list_del(&napi->poll_list);
__napi_schedule_irqoff(napi);
local_irq_restore(flags);
} else {
complete:
napi_complete(napi);
mlx4_en_arm_cq(priv, cq);
}
return done;