Thread (62 messages) 62 messages, 10 authors, 2016-06-22

Re: [PATCH v6 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)

From: Bendik Rønning Opstad <hidden>
Date: 2016-06-16 17:15:18

On 21/03/16 19:54, Yuchung Cheng wrote:
On Thu, Mar 17, 2016 at 4:26 PM, Bendik Rønning Opstad
[off-list ref] wrote:
quoted
quoted
quoted
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6a92b15..8f3f3bf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -716,6 +716,21 @@ tcp_thin_dpifl_itt_lower_bound - INTEGER
        calculated, which is used to classify whether a stream is thin.
        Default: 10000

+tcp_rdb - BOOLEAN
+       Enable RDB for all new TCP connections.
  Please describe RDB briefly, perhaps with a pointer to your paper.
Ah, yes, that description may have been a bit too brief...

What about pointing to tcp-thin.txt in the brief description, and
rewrite tcp-thin.txt with a more detailed description of RDB along
with a paper reference?
+1
quoted
quoted
   I suggest have three level of controls:
   0: disable RDB completely
   1: enable indiv. thin-stream conn. to use RDB via TCP_RDB socket
options
   2: enable RDB on all thin-stream conn. by default

   currently it only provides mode 1 and 2. but there may be cases where
   the administrator wants to disallow it (e.g., broken middle-boxes).
Good idea. Will change this.
I have implemented your suggestion in the next patch.
quoted
quoted
It also seems better to
allow individual socket to select the redundancy level (e.g.,
setsockopt TCP_RDB=3 means <=3 pkts per bundle) vs a global setting.
This requires more bits in tcp_sock but 2-3 more is suffice.
Most certainly. We decided not to implement this for the patch to keep
it as simple as possible, however, we surely prefer to have this
functionality included if possible.
Next patch version has a socket option to allow modifying the different
RDB settings.
quoted
quoted
quoted
+int tcp_transmit_rdb_skb(struct sock *sk, struct sk_buff *xmit_skb,
+                        unsigned int mss_now, gfp_t gfp_mask)
+{
+       struct sk_buff *rdb_skb = NULL;
+       struct sk_buff *first_to_bundle;
+       u32 bytes_in_rdb_skb = 0;
+
+       /* How we detect that RDB was used. When equal, no RDB data was sent */
+       TCP_SKB_CB(xmit_skb)->tx.rdb_start_seq = TCP_SKB_CB(xmit_skb)->seq;
quoted
+
+       if (!tcp_stream_is_thin_dpifl(tcp_sk(sk)))
During loss recovery tcp inflight fluctuates and would like to trigger
this check even for non-thin-stream connections.
Good point.
quoted
Since the loss
already occurs, RDB can only take advantage from limited-transmit,
which it likely does not have (b/c its a thin-stream). It might be
checking if the state is open.
You mean to test for open state to avoid calling rdb_can_bundle_test()
unnecessarily if we (presume to) know it cannot bundle anyway? That
makes sense, however, I would like to do some tests on whether "state
!= open" is a good indicator on when bundling is not possible.
When testing this I found that bundling can often be performed when
not in Open state. For the most part in CWR mode, but also the other
modes, so this does not seem like a good indicator.

The only problem with tcp_stream_is_thin_dpifl() triggering for
non-thin streams in loss recovery would be the performance penalty of
calling rdb_can_bundle_test(). It would not be able to bundle anyways
since the previous SKB would contain >= mss worth of data.

The most reliable test is to check available space in the previous
SKB, i.e. if (xmit_skb->prev->len == mss_now). Do you suggest, for
performance reasons, to do this before the call to
tcp_stream_is_thin_dpifl()?
quoted
quoted
since RDB will cause DSACKs, and we only blindly count DSACKs to
perform CWND undo. How does RDB handle that false positives?
That is a very good question. The simple answer is that the
implementation does not handle any such false positives, which I
expect can result in incorrectly undoing CWND reduction in some cases.
This gets a bit complicated, so I'll have to do some more testing on
this to verify with certainty when it happens.

When there is no loss, and each RDB packet arriving at the receiver
contains both already received and new data, the receiver will respond
with an ACK that acknowledges new data (moves snd_una), with the SACK
field populated with the already received sequence range (DSACK).

The DSACKs in these incoming ACKs are not counted (tp->undo_retrans--)
unless tp->undo_marker has been set by tcp_init_undo(), which is
called by either tcp_enter_loss() or tcp_enter_recovery(). However,
whenever a loss is detected by rdb_detect_loss(), tcp_enter_cwr() is
called, which disables CWND undo. Therefore, I believe the incorrect
thanks for the clarification. it might worth a short comment on why we
use tcp_enter_cwr() (to disable undo)

quoted
counting of DSACKs from ACKs on RDB packets will only be a problem
after the regular loss detection mechanisms (Fast Retransmit/RTO) have
been triggered (i.e. we are in either TCP_CA_Recovery or TCP_CA_Loss).

We have recorded the CWND values for both RDB and non-RDB streams in
our experiments, and have not found any obvious red flags when
analysing the results, so I presume (hope may be more precise) this is
not a major issue we have missed. Nevertheless, I will investigate
this in detail and get back to you.
I've looked into this and tried to figure out in which cases this is
actually a problem, but I have failed to find any.

One scenario I considered is when an RDB packet is sent right after a
retransmit, which would result in DSACK in the ACK in response to the
RDB packet.

With a bundling limit of 1 packet, two packets must be lost for RDB to
fail to repair the loss, causing dupACKs. So if three packets are sent,
where the first two are lost, the last packet will cause a dupACK,
resulting in a fast retransmit (and entering recovery which calls
tcp_init_undo()).

By writing new data to the socket right after the fast retransmit,
a new RDB packet is built with some old data that was just
retransmitted.

On the ACK on the fast retransmit the state is changed from Recovery
to Open. The next incoming ACK (on the RDB packet) will contain a DSACK
range, but it will not be considered dubious (tcp_ack_is_dubious())
since "!(flag & FLAG_NOT_DUP)" is false (new data was acked), state is
Open, and "flag & FLAG_CA_ALERT" evaluates to false.


Feel free to suggest scenarios (as detailed as possible) with the
potential to cause such false positives, and I'll test them with
packetdrill.

Bendik
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help