Re: [PATCH] loop unrolling in net/sched/sch_generic.c
From: "David S. Miller" <davem@davemloft.net>
Date: 2005-07-05 23:45:03
From: "David S. Miller" <davem@davemloft.net>
Date: 2005-07-05 23:45:03
From: Thomas Graf <tgraf@suug.ch> Date: Wed, 6 Jul 2005 01:41:04 +0200
I still think we can fix this performance issue without manually unrolling the loop or we should at least try to. In the end gcc should notice the constant part of the loop and move it out so basically the only difference should the additional prio++ and possibly a failing branch prediction.
But the branch prediction is where I personally think a lot of the lossage is coming from. These can cost upwards of 20 or 30 processor cycles, easily. That's getting close to the cost of a L2 cache miss. I see the difficulties with this change now, why don't we revisit this some time in the future?