Thread (128 messages) 128 messages, 10 authors, 2017-07-10

Re: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support

From: Hu, Jiayu <hidden>
Date: 2017-06-30 15:40:31

Hi Konstantin,
-----Original Message-----
From: Ananyev, Konstantin
Sent: Friday, June 30, 2017 8:07 PM
To: Hu, Jiayu <redacted>
Cc: dev@dpdk.org; Tan, Jianfeng <redacted>;
stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing
[off-list ref]; Yao, Lei A [off-list ref]; Wiles, Keith
[off-list ref]; Bie, Tiwei [off-list ref]
Subject: RE: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support

Hi Jiayu,
quoted
quoted
quoted
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
+
+	struct tcp_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto fail;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto fail;
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
Your key.ip_src_addr[1-3] still contains some junk.
How memcmp below supposed to worj properly?
When allocate an item, we already guarantee the content of its
memory space is 0. So memcpy won't be error.
key is allocated on the stack.
Obviously fileds that are not initialized manually will contain undefined values,
i.e.: ip_src-addr[1-3].
Then below you are doing:
memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
...
memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));

So I still think you are comparing/copying some junk here.
Oh, yes. Key is allocated in stack. Thanks, I will modify it.
quoted
quoted
BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
Same for ip_dst_addr.
I think tcp6 and tcp4 can share the same table structure. So I use
128bit IP address here. You mean we need to use different structures
for tcp4 and tcp6?
That would be my preference.
Got it. I will modify it. Thanks.
quoted
quoted
quoted
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if (tbl->keys[i].is_valid &&
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp_key))
+				 == 0)) {
+			cur_idx = tbl->keys[i].start_index;
+			prev_idx = cur_idx;
+			while (cur_idx != INVALID_ARRAY_INDEX) {
+				if (check_seq_option(tbl->items[cur_idx].pkt,
+							tcp_hdr,
+							tcp_hl) > 0) {
As I remember linux gro also check ipv4 packet_id - it should be
consecutive.
quoted
IP fragmented packets have the same IP ID, but others are consecutive.
Yes, you assume that they are consecutive.
But the only way to know for sure that they are - check it.
Yes, I will modify it. Thanks.
Another thing - I think we need to perform GRO only for TCP packets with
only ACK bit set
(i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).
Currently, we don't process packets whose payload length is 0. So if the packets
are SYN or FIN, we won't process them, since their payload length is 0. For URG,
PSH and RST, TCP/IPv4 GRO may still process these packets.

You are right. We shouldn't process packets whose URG, PSH or RST bit is set.
Thanks, I will modify it. 
quoted
As we  suppose GRO can merge IP fragmented packets, so I think we
shouldn't check if
quoted
the IP ID is consecutive here. How do you think?
quoted
quoted
+					if (merge_two_tcp4_packets(
+								tbl-
items[cur_idx].pkt,
quoted
quoted
+								pkt,
+
	max_packet_size) > 0) {
quoted
quoted
quoted
+						/* successfully merge two
packets */
quoted
quoted
quoted
+						tbl-
items[cur_idx].is_groed = 1;
quoted
quoted
+						return 1;
+					}
If you allow more then packet per flow to be stored in the table, then you
should be
quoted
quoted
prepared that new segment can fill a gap between 2 packets.
Probably the easiest thing - don't allow more then one 'item' per flow.
We allow the table to store same flow but out-of-order arriving packets.
For
quoted
these packets, they will occupy different 'item' and we won't re-merge
them.
quoted
For example, there are three same flow tcp packets: p1, p2 and p3. And p1
arrives
quoted
first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p1 and
one
quoted
'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore, in
the
quoted
table, we will have two 'item': item1 to store merged p1 and p2, item2 to
store p3.
quoted
As you can see, TCP GRO can only merges sequential arriving packets. If we
want to
quoted
merge all out-of-order arriving packets, we need to re-process these
packets which
quoted
are already processed and have one 'item'. IMO, this procedure will be very
complicated.
quoted
So we don't do that.

Sorry, I don't understand how to allow one 'item' per-flow. Because
packets are arriving
quoted
out-of-order. If we don't re-process these packets which already have one
'item', how to
quoted
guarantee it?
As I understand you'll have an input array:
<seq=2, seq=1> - you wouldn't be able to merge it.
So I think your merge need be prepared to merge both smaller and bigger
sequence.
Oh yes, it's much better. I will modify it.
About one item my thought : instead of allowing N items per key(flow) - for
simplicity
just allow one item per flow.
In that case we wouldn't allow holes, but still would be able to merge
reordered packets.
Alternatively you can store items ordered by seq, and after merge, try to
merge neighbor
Items too.
Yes, when we insert a new item, we can chain it with the packets of its item
group ordered by seq. After processing all packets, we need to traverse each
item group and try to merge neighbors. But when will 'merge neighbors' happen?
When flush packets from the table? (e.g.  gro_tcp_tbl_timeout_flush)

BRs,
Jiayu
quoted
quoted
quoted
+					/**
+					 * fail to merge two packets since
+					 * it's beyond the max packet length.
+					 * Insert it into the item group.
+					 */
+					goto insert_to_item_group;
+				} else {
+					prev_idx = cur_idx;
+					cur_idx = tbl-
items[cur_idx].next_pkt_idx;
quoted
quoted
+				}
+			}
+			/**
+			 * find a corresponding item group but fails to find
+			 * one packet to merge. Insert it into this item group.
+			 */
+insert_to_item_group:
+			item_idx = find_an_empty_item(tbl);
+			/* the item number is greater than the max value */
+			if (item_idx == INVALID_ARRAY_INDEX)
+				return -1;
+			tbl->items[prev_idx].next_pkt_idx = item_idx;
+			tbl->items[item_idx].pkt = pkt;
+			tbl->items[item_idx].is_groed = 0;
+			tbl->items[item_idx].next_pkt_idx =
INVALID_ARRAY_INDEX;
quoted
quoted
quoted
+			tbl->items[item_idx].is_valid = 1;
+			tbl->items[item_idx].start_time = rte_rdtsc();
+			tbl->item_num++;
+			return 0;
+		}
+	}
+
+	/**
+	 * merge fail as the given packet has a new key.
+	 * So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if current key or item number is greater than the max
+	 * value, don't insert the packet into the table and return
+	 * immediately.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].is_groed = 0;
+	tbl->items[item_idx].is_valid = 1;
+	tbl->items[item_idx].start_time = rte_rdtsc();
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help