Thread (21 messages) 21 messages, 4 authors, 2009-02-03

Re: [PATCH 6/6] netfilter: convert x_tables to use RCU

From: Eric Dumazet <hidden>
Date: 2009-01-31 17:27:54
Subsystem: networking [general], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds

Stephen Hemminger a écrit :
Replace existing reader/writer lock with Read-Copy-Update to
elminate the overhead of a read lock on each incoming packet.
This should reduce the overhead of iptables especially on SMP
systems.

The previous code used a reader-writer lock for two purposes.
The first was to ensure that the xt_table_info reference was not in
process of being changed. Since xt_table_info is only freed via one
routine, it was a direct conversion to RCU.

The other use of the reader-writer lock was to to block changes
to counters while they were being read. This synchronization was
fixed by the previous patch.  But still need to make sure table info
isn't going away.

Signed-off-by: Stephen Hemminger <redacted>


---
 include/linux/netfilter/x_tables.h |   10 ++++++--
 net/ipv4/netfilter/arp_tables.c    |   16 ++++++-------
 net/ipv4/netfilter/ip_tables.c     |   27 +++++++++++-----------
 net/ipv6/netfilter/ip6_tables.c    |   16 ++++++-------
 net/netfilter/x_tables.c           |   45 ++++++++++++++++++++++++++-----------
 5 files changed, 70 insertions(+), 44 deletions(-)
OK, I have a last comment Stephen

(I tested your patches on my dev machine and got nice results)

In net/ipv4/netfilter/ip_tables.c,
get_counters() can be quite slow on large firewalls, and it is
called from alloc_counters() with BH disabled, but also from __do_replace()
without BH disabled.

So get_counters() first iterates on set_entry_to_counter() for current cpu
but might be interrupted and we get bad results...

Solution is to always use seqcount_t protection, and no BH disabling at all.

Then, I wonder if all this is valid, since alloc_counters() might be supposed
to perform an atomic snapshots of all counters. This was possible because
of a write_lock_bh(), but does it make any sense to block all packets for
such a long time ??? If this is a strong requirement, I am afraid we have
to make other changes...


Signed-off-by: Eric Dumazet <redacted>
---
 net/ipv4/netfilter/ip_tables.c |   25 +++----------------------
 1 files changed, 3 insertions(+), 22 deletions(-)
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 7c50e2e..d208b25 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -900,25 +900,8 @@ get_counters(const struct xt_table_info *t,
 {
 	unsigned int cpu;
 	unsigned int i;
-	unsigned int curcpu;
-
-	/* Instead of clearing (by a previous call to memset())
-	 * the counters and using adds, we set the counters
-	 * with data used by 'current' CPU
-	 * We dont care about preemption here.
-	 */
-	curcpu = raw_smp_processor_id();
-
-	i = 0;
-	IPT_ENTRY_ITERATE(t->entries[curcpu],
-			  t->size,
-			  set_entry_to_counter,
-			  counters,
-			  &i);
 
 	for_each_possible_cpu(cpu) {
-		if (cpu == curcpu)
-			continue;
 		i = 0;
 		IPT_ENTRY_ITERATE(t->entries[cpu],
 				  t->size,
@@ -939,15 +922,12 @@ static struct xt_counters * alloc_counters(struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vmalloc_node(countersize, numa_node_id());
+	counters = __vmalloc(countersize, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	/* First, sum counters... */
-	local_bh_disable();
 	get_counters(private, counters);
-	local_bh_enable();
 
 	return counters;
 }
@@ -1209,7 +1189,8 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
 	void *loc_cpu_old_entry;
 
 	ret = 0;
-	counters = vmalloc(num_counters * sizeof(struct xt_counters));
+	counters = __vmalloc(num_counters * sizeof(struct xt_counters),
+			     GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
 	if (!counters) {
 		ret = -ENOMEM;
 		goto out;
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help