[PATCH] rcu: increment quiescent state counter in ksoftirqd()
From: Eric Dumazet <hidden>
Date: 2009-02-27 16:08:38
Also in:
lkml, netfilter-devel
Subsystem:
the rest · Maintainer:
Linus Torvalds
Eric Dumazet a écrit :
Eric Dumazet a écrit :quoted
Stephen Hemminger a écrit :quoted
The reader/writer lock in ip_tables is acquired in the critical path of processing packets and is one of the reasons just loading iptables can cause a 20% performance loss. The rwlock serves two functions: 1) it prevents changes to table state (xt_replace) while table is in use. This is now handled by doing rcu on the xt_table. When table is replaced, the new table(s) are put in and the old one table(s) are freed after RCU period. 2) it provides synchronization when accesing the counter values. This is now handled by swapping in new table_info entries for each cpu then summing the old values, and putting the result back onto one cpu. On a busy system it may cause sampling to occur at different times on each cpu, but no packet/byte counts are lost in the process. Signed-off-by: Stephen Hemminger <redacted>Acked-by: Eric Dumazet <redacted> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here) BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago) Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :)While testing multicast flooding stuff, I found that "iptables -nvL" can have a *very* slow response time on my dual quad core machine... # time iptables -nvL Chain INPUT (policy ACCEPT 416M packets, 64G bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes) pkts bytes target prot opt in out source destination real 0m1.810s <<<< HERE >>>> user 0m0.000s sys 0m0.001s CONFIG_NO_HZ=y CONFIG_HZ_1000=y CONFIG_HZ=1000 One cpu is 100% handling softirqs, could it be the problem ? Cpu0 : 1.0%us, 14.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu1 : 3.6%us, 23.2%sy, 0.0%ni, 71.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st Cpu3 : 2.7%us, 23.9%sy, 0.0%ni, 71.1%id, 0.7%wa, 0.0%hi, 1.7%si, 0.0%st Cpu4 : 1.3%us, 14.3%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu5 : 1.0%us, 14.2%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st Cpu6 : 0.3%us, 7.0%sy, 0.0%ni, 92.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu7 : 0.7%us, 8.0%sy, 0.0%ni, 90.0%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
Hi Paul I found following patch helps if one cpu is looping inside ksoftirqd() synchronize_rcu() now completes in 40 ms instead of 1800 ms. Thank you [PATCH] rcu: increment quiescent state counter in ksoftirqd() If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Signed-off-by: Eric Dumazet <redacted> ---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index bdbe9de..9041ea7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c@@ -626,6 +626,7 @@ static int ksoftirqd(void * __bind_cpu) preempt_enable_no_resched(); cond_resched(); preempt_disable(); + rcu_qsctr_inc((long)__bind_cpu); } preempt_enable(); set_current_state(TASK_INTERRUPTIBLE); --
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html