Re: [PATCH net-next v1 2/6] lan743x: support rx multi-buffer packets
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2021-01-29 22:13:13
Also in:
lkml
On Fri, Jan 29, 2021 at 2:56 PM Sven Van Asbroeck [off-list ref] wrote:
From: Sven Van Asbroeck <redacted>
Multi-buffer packets enable us to use rx ring buffers smaller than
the mtu. This will allow us to change the mtu on-the-fly, without
having to stop the network interface in order to re-size the rx
ring buffers.
This is a big change touching a key driver function (process_packet),
so care has been taken to test this extensively:
Tests with debug logging enabled (add #define DEBUG).
1. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Ping to chip, verify correct packet size is sent to OS.
Ping large packets to chip (ping -s 1400), verify correct
packet size is sent to OS.
Ping using packets around the buffer size, verify number of
buffers is changing, verify correct packet size is sent
to OS:
$ ping -s 472
$ ping -s 473
$ ping -s 992
$ ping -s 993
Verify that each packet is followed by extension processing.
2. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, verify that packets come in 3 buffers
at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.
3. Set chip and host mtu to 2000.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run iperf3 -s on chip, verify that packets come in 4 buffers
at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.
Tests with debug logging DISabled (remove #define DEBUG).
4. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Set chip and host mtu to 2000, so mtu takes 4 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Verify no packets are dropped in both cases.
Tests with DEBUG_KMEMLEAK on:
$ mount -t debugfs nodev /sys/kernel/debug/
$ echo scan > /sys/kernel/debug/kmemleak
5. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run the following tests concurrently for at least one hour:
- iperf3 -s on chip
- ping -> chip
Monitor reported memory leaks.
6. Set chip and host mtu to 2000.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run the following tests concurrently for at least one hour:
- iperf3 -s on chip
- ping -> chip
Monitor reported memory leaks.
7. Simulate low-memory in lan743x_rx_allocate_skb(): fail every
100 allocations.
Repeat (5) and (6).
Monitor reported memory leaks.
8. Simulate low-memory in lan743x_rx_allocate_skb(): fail 10
allocations in a row in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.
9. Simulate low-memory in lan743x_rx_trim_skb(): fail 1 allocation
in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.
Signed-off-by: Sven Van Asbroeck <redacted>
---
Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 46eb3c108fe1
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <redacted>
Cc: Sergej Bauer <redacted>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <redacted>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
drivers/net/ethernet/microchip/lan743x_main.c | 321 ++++++++----------
drivers/net/ethernet/microchip/lan743x_main.h | 2 +
2 files changed, 143 insertions(+), 180 deletions(-)+static struct sk_buff *
+lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length)
+{
+ if (skb_linearize(skb)) {Is this needed? That will be quite expensive
quoted hunk ↗ jump to hunk
+ dev_kfree_skb_irq(skb); + return NULL; + } + frame_length = max_t(int, 0, frame_length - RX_HEAD_PADDING - 2); + if (skb->len > frame_length) { + skb->tail -= skb->len - frame_length; + skb->len = frame_length; + } + return skb; +} + static int lan743x_rx_process_packet(struct lan743x_rx *rx) { - struct skb_shared_hwtstamps *hwtstamps = NULL; + struct lan743x_rx_descriptor *descriptor, *desc_ext; int result = RX_PROCESS_RESULT_NOTHING_TO_DO; int current_head_index = le32_to_cpu(*rx->head_cpu_ptr); struct lan743x_rx_buffer_info *buffer_info; - struct lan743x_rx_descriptor *descriptor; + struct skb_shared_hwtstamps *hwtstamps; + int frame_length, buffer_length; + struct sk_buff *skb; int extension_index = -1; - int first_index = -1; - int last_index = -1; + bool is_last, is_first; if (current_head_index < 0 || current_head_index >= rx->ring_size) goto done;@@ -2068,170 +2075,126 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx) if (rx->last_head < 0 || rx->last_head >= rx->ring_size) goto done; - if (rx->last_head != current_head_index) { - descriptor = &rx->ring_cpu_ptr[rx->last_head]; - if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_) - goto done; + if (rx->last_head == current_head_index) + goto done;
Is it possible to avoid the large indentation change, or else do that in a separate patch? It makes it harder to follow the functional change.
- if (!(le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_))
- goto done;
+ descriptor = &rx->ring_cpu_ptr[rx->last_head];
+ if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
+ goto done;
+ buffer_info = &rx->buffer_info[rx->last_head];
- first_index = rx->last_head;
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = rx->last_head;
- } else {
- int index;
-
- index = lan743x_rx_next_index(rx, first_index);
- while (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
- goto done;
-
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = index;
- break;
- }
- index = lan743x_rx_next_index(rx, index);
- }
- }
- if (last_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[last_index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
- /* extension is expected to follow */
- int index = lan743x_rx_next_index(rx,
- last_index);
- if (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_OWN_) {
- goto done;
- }
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_EXT_) {
- extension_index = index;
- } else {
- goto done;
- }
- } else {
- /* extension is not yet available */
- /* prevent processing of this packet */
- first_index = -1;
- last_index = -1;
- }
- }
- }
+ is_last = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_;
+ is_first = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_;
+
+ if (is_last && le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
+ /* extension is expected to follow */
+ int index = lan743x_rx_next_index(rx, rx->last_head);
+
+ if (index == current_head_index)
+ /* extension not yet available */
+ goto done;
+ desc_ext = &rx->ring_cpu_ptr[index];
+ if (le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_OWN_)
+ /* extension not yet available */
+ goto done;
+ if (!(le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_EXT_))
+ goto move_forward;
+ extension_index = index;
}
- if (first_index >= 0 && last_index >= 0) {
- int real_last_index = last_index;
- struct sk_buff *skb = NULL;
- u32 ts_sec = 0;
- u32 ts_nsec = 0;
-
- /* packet is available */
- if (first_index == last_index) {
- /* single buffer packet */
- struct sk_buff *new_skb = NULL;
- int packet_length;
-
- new_skb = lan743x_rx_allocate_skb(rx);
- if (!new_skb) {
- /* failed to allocate next skb.
- * Memory is very low.
- * Drop this packet and reuse buffer.
- */
- lan743x_rx_reuse_ring_element(rx, first_index);
- goto process_extension;
- }
- buffer_info = &rx->buffer_info[first_index];
- skb = buffer_info->skb;
- descriptor = &rx->ring_cpu_ptr[first_index];
-
- /* unmap from dma */
- packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
- (descriptor->data0);
- if (buffer_info->dma_ptr) {
- dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr,
- packet_length,
- DMA_FROM_DEVICE);
- dma_unmap_single_attrs(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr,
- buffer_info->buffer_length,
- DMA_FROM_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC);
- buffer_info->dma_ptr = 0;
- buffer_info->buffer_length = 0;
- }
- buffer_info->skb = NULL;
- packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
- (le32_to_cpu(descriptor->data0));
- skb_put(skb, packet_length - 4);
- skb->protocol = eth_type_trans(skb,
- rx->adapter->netdev);
- lan743x_rx_init_ring_element(rx, first_index, new_skb);
- } else {
- int index = first_index;
+ /* Only the last buffer in a multi-buffer frame contains the total frame
+ * length. All other buffers have a zero frame length. The chip
+ * occasionally sends more buffers than strictly required to reach the
+ * total frame length.
+ * Handle this by adding all buffers to the skb in their entirety.
+ * Once the real frame length is known, trim the skb.
+ */
+ frame_length =
+ RX_DESC_DATA0_FRAME_LENGTH_GET_(le32_to_cpu(descriptor->data0));
+ buffer_length = buffer_info->buffer_length;
- /* multi buffer packet not supported */
- /* this should not happen since buffers are allocated
- * to be at least the mtu size configured in the mac.
- */
+ netdev_dbg(rx->adapter->netdev, "%s%schunk: %d/%d",
+ is_first ? "first " : " ",
+ is_last ? "last " : " ",
+ frame_length, buffer_length);
- /* clean up buffers */
- if (first_index <= last_index) {
- while ((index >= first_index) &&
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- } else {
- while ((index >= first_index) ||
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- }
- }
+ /* unmap from dma */
+ if (buffer_info->dma_ptr) {
+ dma_unmap_single(&rx->adapter->pdev->dev,
+ buffer_info->dma_ptr,
+ buffer_info->buffer_length,
+ DMA_FROM_DEVICE);
+ buffer_info->dma_ptr = 0;
+ buffer_info->buffer_length = 0;
+ }
+ skb = buffer_info->skb;
-process_extension:
- if (extension_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[extension_index];
- buffer_info = &rx->buffer_info[extension_index];
-
- ts_sec = le32_to_cpu(descriptor->data1);
- ts_nsec = (le32_to_cpu(descriptor->data2) &
- RX_DESC_DATA2_TS_NS_MASK_);
- lan743x_rx_reuse_ring_element(rx, extension_index);
- real_last_index = extension_index;
- }
+ /* allocate new skb and map to dma */
+ if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ /* failed to allocate next skb.
+ * Memory is very low.
+ * Drop this packet and reuse buffer.
+ */
+ lan743x_rx_reuse_ring_element(rx, rx->last_head);
+ goto process_extension;
+ }
+
+ /* add buffers to skb via skb->frag_list */
+ if (is_first) {
+ skb_reserve(skb, RX_HEAD_PADDING);
+ skb_put(skb, buffer_length - RX_HEAD_PADDING);
+ if (rx->skb_head)
+ dev_kfree_skb_irq(rx->skb_head);
+ rx->skb_head = skb;
+ } else if (rx->skb_head) {
+ skb_put(skb, buffer_length);
+ if (skb_shinfo(rx->skb_head)->frag_list)
+ rx->skb_tail->next = skb;
+ else
+ skb_shinfo(rx->skb_head)->frag_list = skb;Instead of chaining skbs into frag_list, you could perhaps delay skb alloc until after reception, allocate buffers stand-alone, and link them into the skb as skb_frags? That might avoid a few skb alloc + frees. Though a bit change, not sure how feasible.
+ rx->skb_tail = skb;
+ rx->skb_head->len += skb->len;
+ rx->skb_head->data_len += skb->len;
+ rx->skb_head->truesize += skb->truesize;
+ } else {
+ rx->skb_head = skb;
+ }
- if (!skb) {
- result = RX_PROCESS_RESULT_PACKET_DROPPED;
- goto move_forward;
+process_extension:
+ if (extension_index >= 0) {
+ u32 ts_sec;
+ u32 ts_nsec;
+
+ ts_sec = le32_to_cpu(desc_ext->data1);
+ ts_nsec = (le32_to_cpu(desc_ext->data2) &
+ RX_DESC_DATA2_TS_NS_MASK_);
+ if (rx->skb_head) {
+ hwtstamps = skb_hwtstamps(rx->skb_head);
+ if (hwtstamps)This is always true. You can just call skb_hwtstamps(skb)->hwtstamp = ktime_set(ts_sec, ts_nsec); Though I see that this is existing code just moved due to aforementioned indentation change.