Thread (40 messages) 40 messages, 6 authors, 2011-01-24

Re: Flow Control and Port Mirroring Revisited

From: Simon Horman <horms@verge.net.au>
Date: 2011-01-06 12:44:44
Also in: kvm, netdev
Subsystem: the rest · Maintainer: Linus Torvalds

On Thu, Jan 06, 2011 at 11:22:42AM +0100, Eric Dumazet wrote:
Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit :
quoted
Hi,

Back in October I reported that I noticed a problem whereby flow control
breaks down when openvswitch is configured to mirror a port[1].

I have (finally) looked into this further and the problem appears to relate
to cloning of skbs, as Jesse Gross originally suspected.

More specifically, in do_execute_actions[2] the first n-1 times that an skb
needs to be transmitted it is cloned first and the final time the original
skb is used.

In the case that there is only one action, which is the normal case, then
the original skb will be used. But in the case of mirroring the cloning
comes into effect. And in my case the cloned skb seems to go to the (slow)
eth1 interface while the original skb goes to the (fast) dummy0 interface
that I set up to be a mirror. The result is that dummy0 "paces" the flow,
and its a cracking pace at that.

As an experiment I hacked do_execute_actions() to use the original skb
for the first action instead of the last one.  In my case the result was
that eth1 "paces" the flow, and things work reasonably nicely.

Well, sort of. Things work well for non-GSO skbs but extremely poorly for
GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
netserv. I'm unsure why, but I digress.

It seems to me that my hack illustrates the point that the flow ends up
being "paced" by one interface. However I think that what would be
desirable is that the flow is "paced" by the slowest link. Unfortunately
I'm unsure how to achieve that.
Hi Simon !

"pacing" is done because skb is attached to a socket, and a socket has a
limited (but configurable) sndbuf. sk->sk_wmem_alloc is the current sum
of all truesize skbs in flight.

When you enter something that :

1) Get a clone of the skb, queue the clone to device X
2) queue the original skb to device Y

Then :	Socket sndbuf is not affected at all by device X queue.
	This is speed on device Y that matters.

You want to get servo control on both X and Y

You could try to

1) Get a clone of skb
   Attach it to socket too (so that socket get a feedback of final
orphaning for the clone) with skb_set_owner_w()
   queue the clone to device X

Unfortunatly, stacked skb->destructor() makes this possible only for
known destructor (aka sock_wfree())
Hi Eric !

Thanks for the advice. I had thought about the socket buffer but at some
point it slipped my mind.

In any case the following patch seems to implement the change that I had in
mind. However my discussions Michael Tsirkin elsewhere in this thread are
beginning to make me think that think that perhaps this change isn't the
best solution.
diff --git a/datapath/actions.c b/datapath/actions.c
index 5e16143..505f13f 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 
 	for (a = actions, rem = actions_len; rem > 0; a = nla_next(a, &rem)) {
 		if (prev_port != -1) {
-			do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
+			struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+			if (nskb) {
+				if (skb->sk)
+					skb_set_owner_w(nskb, skb->sk);
+				do_output(dp, nskb, prev_port);
+			}
 			prev_port = -1;
 		}
I got a rather nasty panic without the if (skb->sk),
I guess some skbs don't have a socket.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help