Thread (32 messages) 32 messages, 5 authors, 2013-03-30

Re: [RFC][PATCH] iproute: Faster ip link add, set and delete

From: Eric W. Biederman <hidden>
Date: 2013-03-29 01:06:44

Eric Dumazet [off-list ref] writes:
On Thu, 2013-03-28 at 17:25 -0700, Eric W. Biederman wrote:
quoted
Eric Dumazet [off-list ref] writes:
quoted
On Thu, 2013-03-28 at 16:52 -0700, Eric W. Biederman wrote:
quoted
On my microbenchmark of just creating 5000 veth pairs this takes pairs
16s instead of 13s of my earlier hacks but that is well down in the
usable range.
I guess most of the time is taken by sysctl_check_table()
All of the significant sysctl slowdowns were fixed in 3.4.  If you see
something of sysctl show up in a trace I would be happy to talk about
it.  The kernel side seems to be creating N network devices seems to
take NlogN time now.  Both sysfs and sysctl store directories as
rbtrees removing their previous bottlenecks.

The loop I timed at 16s was just:

time for i in $(seq 1 5000) ; do ip link add a$i type veth peer name b$i; done

There is plenty of room for inefficiencies in 10000 network devices and
5000 forks+execs.
Ah right, the sysctl part is fixed ;)

In batch mode, I can create these veth pairs in 4 seconds

for i in $(seq 1 5000) ; do echo link add a$i type veth peer name b$i;
done | ip -batch -
Yes.  The interesting story here is that the bottleneck before these
patches was the ll_init_map function of iproute2.   Which resulted in an
over an order of magnitude slowdown of when starting iproute on a system
with lots of network devices.

It is still unclear where iproute comes into the picture in the original
problem scenario of creating 2000 containers each with 2 veth pairs.
But apparently it was.

As the fundamental use case here was taking 2000 separate independent
actions it turns out to be important for things to not slowdown
unreasonably outside of batch mode.  So I was explicitly testing the
non-batch mode performance.

On the flip side it might be interesting to see if we can get batch mode
deletes to batch in the kernel, so we don't have to wait for through
syncrhonize_rcu_expidited for each of them.  Although for the container
case I can just drop the last reference to the network namespace and all
of the network device removals will batch.

Ultimately shrug.  Except in the previous O(N^2) userspace behavior
there don't seem to be any practical performance problems with this many
network devices.  What is interesting is that this many network devices
is becoming interesting on inexpensive COTS servers, for cases that are
not purely network focused.

Eric
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help