Re: [PATCH 6/6] netfilter: convert x_tables to use RCU
From: Eric Dumazet <hidden>
Date: 2009-01-31 17:27:54
Subsystem:
networking [general], the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds
Stephen Hemminger a écrit :
Replace existing reader/writer lock with Read-Copy-Update to elminate the overhead of a read lock on each incoming packet. This should reduce the overhead of iptables especially on SMP systems. The previous code used a reader-writer lock for two purposes. The first was to ensure that the xt_table_info reference was not in process of being changed. Since xt_table_info is only freed via one routine, it was a direct conversion to RCU. The other use of the reader-writer lock was to to block changes to counters while they were being read. This synchronization was fixed by the previous patch. But still need to make sure table info isn't going away. Signed-off-by: Stephen Hemminger <redacted> --- include/linux/netfilter/x_tables.h | 10 ++++++-- net/ipv4/netfilter/arp_tables.c | 16 ++++++------- net/ipv4/netfilter/ip_tables.c | 27 +++++++++++----------- net/ipv6/netfilter/ip6_tables.c | 16 ++++++------- net/netfilter/x_tables.c | 45 ++++++++++++++++++++++++++----------- 5 files changed, 70 insertions(+), 44 deletions(-)
OK, I have a last comment Stephen (I tested your patches on my dev machine and got nice results) In net/ipv4/netfilter/ip_tables.c, get_counters() can be quite slow on large firewalls, and it is called from alloc_counters() with BH disabled, but also from __do_replace() without BH disabled. So get_counters() first iterates on set_entry_to_counter() for current cpu but might be interrupted and we get bad results... Solution is to always use seqcount_t protection, and no BH disabling at all. Then, I wonder if all this is valid, since alloc_counters() might be supposed to perform an atomic snapshots of all counters. This was possible because of a write_lock_bh(), but does it make any sense to block all packets for such a long time ??? If this is a strong requirement, I am afraid we have to make other changes... Signed-off-by: Eric Dumazet <redacted> --- net/ipv4/netfilter/ip_tables.c | 25 +++---------------------- 1 files changed, 3 insertions(+), 22 deletions(-)
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 7c50e2e..d208b25 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c@@ -900,25 +900,8 @@ get_counters(const struct xt_table_info *t, { unsigned int cpu; unsigned int i; - unsigned int curcpu; - - /* Instead of clearing (by a previous call to memset()) - * the counters and using adds, we set the counters - * with data used by 'current' CPU - * We dont care about preemption here. - */ - curcpu = raw_smp_processor_id(); - - i = 0; - IPT_ENTRY_ITERATE(t->entries[curcpu], - t->size, - set_entry_to_counter, - counters, - &i); for_each_possible_cpu(cpu) { - if (cpu == curcpu) - continue; i = 0; IPT_ENTRY_ITERATE(t->entries[cpu], t->size,
@@ -939,15 +922,12 @@ static struct xt_counters * alloc_counters(struct xt_table *table) (other than comefrom, which userspace doesn't care about). */ countersize = sizeof(struct xt_counters) * private->number; - counters = vmalloc_node(countersize, numa_node_id()); + counters = __vmalloc(countersize, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL); if (counters == NULL) return ERR_PTR(-ENOMEM); - /* First, sum counters... */ - local_bh_disable(); get_counters(private, counters); - local_bh_enable(); return counters; }
@@ -1209,7 +1189,8 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, void *loc_cpu_old_entry; ret = 0; - counters = vmalloc(num_counters * sizeof(struct xt_counters)); + counters = __vmalloc(num_counters * sizeof(struct xt_counters), + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL); if (!counters) { ret = -ENOMEM; goto out;