Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49
From: Paul E. McKenney <hidden>
Date: 2009-04-11 05:42:24
Also in:
lkml, netfilter-devel
On Sat, Apr 11, 2009 at 07:14:50AM +0200, Jan Engelhardt wrote:
On Saturday 2009-04-11 06:15, Paul E. McKenney wrote:quoted
On Fri, Apr 10, 2009 at 06:39:18PM -0700, Linus Torvalds wrote:quoted
An unhappy user reported:quoted
quoted
quoted
Adding 200 records in iptables took 6.0sec in 2.6.30-rc1 compared to 0.2sec in 2.6.29. I've bisected down this commit. 784544739a25c30637397ace5489eeb6e15d7d49I wonder if we should bring in the RCU people too, for them to tell you that the networking people are beign silly, and should not synchronize with the very heavy-handed synchronize_net() but instead of doing synchronization (which is probably why adding a few hundred rules then takes several seconds - each synchronizes and that takes a timer tick or so), add the rules to be free'd on some rcu-freeing list for later freeing.iptables works in whole tables. Userspace submits a table, checkentry is called for all rules in the new table, things are swapped, then destroy is called for all rules in the old table. By that logic (which existed since dawn I think), only the swap operation needs to be locked. Jeff Chua wrote:quoted
So, to make it easy for testing, you can do a loop like this ... for((i = 1; i < 100; i++)) do iptables -A block -s 10.0.0.$i -j ACCEPT doneThe fact that `iptables -A` is called a hundred times means you are doing 100 table replacements -- instead of one. And calling synchronize_net at least a 100 times. "Wanna use iptables-restore?"quoted
1. Assuming that the synchronize_net() is intended to guarantee that the new rules will be in effect before returning to user space:As I read the new code, it seems that synchronize_net is only used on copying the rules from kernel into userspace; not when updating them from userspace: IPT_SO_GET_ENTRIES -> get_entries -> copy_entries_to_user -> alloc_counters -> synchronize_net.
OK.
quoted
3. For the alloc_counters() case, the comments indicate that we really truly do want an atomic sampling of the counters. The counters are 64-bit entities, which is a bit inconvenient. Though people using this functionality are no doubt quite happy to never have to worry about overflow, I hasten to add! I will nevertheless suggest the following egregious hack to get a consistent sample of one counter for some other CPU: [...]Would a seqlock suffice, as it does for the 64-bit jiffies?
The 64-bit jiffies counter is not updated often, so write-acquiring a seqlock on each update is OK. From what I understand, these counters are updated quite often (one each packet transmission or reception?), so write-acquiring on each update would be quite painful. Or did you have something else in mind here? Thanx, Paul