Thread (28 messages) 28 messages, 6 authors, 2013-06-04

Re: [PATCH net-next 01/12] tipc: change socket buffer overflow control to respect sk_rcvbuf

From: Neil Horman <nhorman@tuxdriver.com>
Date: 2013-06-04 13:41:09

On Tue, Jun 04, 2013 at 09:37:54AM +0800, Ying Xue wrote:
On 06/03/2013 09:16 PM, Neil Horman wrote:
quoted
On Mon, Jun 03, 2013 at 05:55:06PM +0800, Ying Xue wrote:
quoted
On 05/31/2013 09:36 PM, Neil Horman wrote:
quoted
On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
quoted
From: Jon Maloy <redacted>

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jon Maloy <redacted>
Signed-off-by: Paul Gortmaker <redacted>
---
 net/tipc/socket.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 515ce38..2dfabc7 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1,7 +1,7 @@
 /*
  * net/tipc/socket.c: TIPC socket API
  *
- * Copyright (c) 2001-2007, 2012 Ericsson AB
+ * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
  * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
  * All rights reserved.
  *
@@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
 
 	sock_init_data(sock, sk);
 	sk->sk_backlog_rcv = backlog_rcv;
+	sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
The last time Jon and I discussed this, I thought the consensus was to export
sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
protocol specific minimum on top of that), so administrators on memory
constrained systems didn't wonder why their sysctl changes weren't being
honored.
Yes, your suggestion is reasonable, and I prefer to involve
net.tipc.sysctl_rmem. But I have one question about it:

As you suggested as belows, the default value of sk->sk_rcvbuf is set to
sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
sk->sk_rcvbuf is about 32MB.

However, please see below code:

int sock_setsockopt()
{
...
	        case SO_RCVBUF:
                /* Don't error on this BSD doesn't and if you think
                 * about it this is right. Otherwise apps have to
                 * play 'guess the biggest size' games. RCVBUF/SNDBUF
                 * are treated in BSD as hints
                 */
                val = min_t(u32, val, sysctl_rmem_max);
set_rcvbuf:
                sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
                /*
                 * We double it on the way in to account for
                 * "struct sk_buff" etc. overhead.   Applications
                 * assume that the SO_RCVBUF setting they make will
                 * allow that much actual data to be received on that
                 * socket.
                 *
                 * Applications are unaware that "struct sk_buff" and
                 * other overheads allocate from the receive buffer
                 * during socket buffer allocation.
                 *
                 * And after considering the possible alternatives,
                 * returning the value we actually used in getsockopt
                 * is the most desirable behavior.
                 */
                sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
                break;
...
}

From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
about 400KB normally).

So, even if the default value of sk->sk_rcvbuf is set to 32MB with
net.tipc.sysctl_rmem, a bit smaller value than the default value can
never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.

How can we avoid the limit?
By administratively adjusting sysctl_rmem_max to be a sufficiently large value
such that using SO_RCVBUF won't be clamed to a lower limit.

If you don't want to force users to have to manually adjust the sysctl, there
might be support for you to automatically update sysctl_rmem_max in your
tipc_init routine, and print an informational message indicating that tipc
requires the additional space (although I still maintain its not strictly
needed, but thats another argument).
Thanks for your clear clarification.

I also have the same concern. If we override sysctl_rmem_max in
tipc_init() with a larger value, I am afraid that other guys will oppose
the behaviour.

The truth is that little TIPC user adjusts the sk->sk_rcvbuf with
SO_RCVBUF option in practice. If he really wants to do, he should follow
your suggestion he manually enlarges the sysctl.

OK, I will rewrite the patch with your suggestion.

Regards,
Ying
quoted
Neil

Sounds good, thanks!
Neil
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help