Re: [PATCH v5 net-next 01/36] net: Introduce direct data placement tcp offload
From: Eric Dumazet <edumazet@google.com>
Date: 2021-07-22 13:34:54
Also in:
netdev
On Thu, Jul 22, 2021 at 2:18 PM Boris Pismenny [off-list ref] wrote:
On 22/07/2021 14:26, Eric Dumazet wrote:quoted
On Thu, Jul 22, 2021 at 1:04 PM Boris Pismenny [off-list ref] wrote:quoted
From: Boris Pismenny <redacted>...quoted
}; const chardiff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e6ca5a1f3b59..4a7160bba09b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c@@ -5149,6 +5149,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); #ifdef CONFIG_TLS_DEVICE nskb->decrypted = skb->decrypted; +#endif +#ifdef CONFIG_ULP_DDP + nskb->ddp_crc = skb->ddp_crc;Probably you do not want to attempt any collapse if skb->ddp_crc is set right there.Right.quoted
quoted
#endif TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; if (list)@@ -5182,6 +5185,11 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, #ifdef CONFIG_TLS_DEVICE if (skb->decrypted != nskb->decrypted) goto end; +#endif +#ifdef CONFIG_ULP_DDP + + if (skb->ddp_crc != nskb->ddp_crc)This checks only the second, third, and remaining skbs.Right, as we handle the head skb above. Could you clarify?
I was simply saying you missed the first skb.
quoted
quoted
+ goto end; #endif } }tcp_collapse() is copying data from small skbs to pack it to bigger skb (one page of payload), in case of memory emergency/pressure (socket queues are full) If your changes are trying to avoid 'needless' copies, maybe you should reconsider and let the emergency packing be done. If the copy is not _possible_, you should rephrase your changelog to clearly state the kernel _cannot_ access this memory in any way.The issue is that skb_condense also gets called on many skbs in tcp_add_backlog and it will identify skbs that went through DDP as ideal for packing, even though they are not small and packing is counter-productive as data already resides in its destination. As mentioned above, it is possible to copy, but it is counter-productive in this case. If there was a real need to access this memory, then it is allowed.
Standard GRO packets from high perf drivers have no room in their skb->head (ie skb_tailroom() should be 0) If you have a driver using GRO and who pulled some payload in skb->head, it is already too late for DDP. So I think you are trying to add code in TCP that should not be needed. Perhaps mlx5 driver is doing something it should not ? (If this is ' copybreak' this has been documented as being suboptimal, transports have better strategies) Secondly, tcp_collapse() should absolutely not be called under regular workloads. Trying to optimize this last-resort thing is a lost cause: If an application is dumb enough to send small packets back-to-back, it should be fixed (sender side has this thing called autocork, for applications that do not know about MSG_MORE or TC_CORK.) (tcp_collapse is a severe source of latencies) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme