Re: [PATCH net-next v3 09/12] iavf: switch to Page Pool
From: Alexander Duyck <hidden>
Date: 2023-06-02 18:00:48
Also in:
intel-wired-lan, lkml
On Fri, Jun 2, 2023 at 9:31 AM Alexander Lobakin [off-list ref] wrote:
From: Alexander H Duyck <redacted> Date: Wed, 31 May 2023 09:19:06 -0700quoted
On Tue, 2023-05-30 at 17:00 +0200, Alexander Lobakin wrote:quoted
Now that the IAVF driver simply uses dev_alloc_page() + free_page() with no custom recycling logics and one whole page per frame, it can easily be switched to using Page Pool API instead.[...]quoted
quoted
@@ -691,8 +690,6 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring) **/ void iavf_clean_rx_ring(struct iavf_ring *rx_ring) { - u16 i; - /* ring already cleared, nothing to do */ if (!rx_ring->rx_pages) return;@@ -703,28 +700,17 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring) } /* Free all the Rx ring sk_buffs */ - for (i = 0; i < rx_ring->count; i++) { + for (u32 i = 0; i < rx_ring->count; i++) {Did we make a change to our coding style to allow declaration of variables inside of for statements? Just wondering if this is a change since the recent updates to the ISO C standard, or if this doesn't match up with what we would expect per the coding standard.It's optional right now, nobody would object declaring it either way. Doing it inside is allowed since we switched to C11, right. Here I did that because my heart was breaking to see this little u16 alone (and yeah, u16 on the stack).
Yeah, that was back when I was declaring stack variables the exact same size as the ring parameters. So u16 should match the size of rx_ring->count not that it matters. It was just a quirk I had at the time.
quoted
quoted
struct page *page = rx_ring->rx_pages[i]; - dma_addr_t dma; if (!page) continue; - dma = page_pool_get_dma_addr(page); - /* Invalidate cache lines that may have been written to by * device so that we avoid corrupting memory. */ - dma_sync_single_range_for_cpu(rx_ring->dev, dma, - LIBIE_SKB_HEADROOM, - LIBIE_RX_BUF_LEN, - DMA_FROM_DEVICE); - - /* free resources associated with mapping */ - dma_unmap_page_attrs(rx_ring->dev, dma, LIBIE_RX_TRUESIZE, - DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR); - - __free_page(page); + page_pool_dma_sync_full_for_cpu(rx_ring->pool, page); + page_pool_put_full_page(rx_ring->pool, page, false); } rx_ring->next_to_clean = 0;@@ -739,10 +725,15 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring) **/ void iavf_free_rx_resources(struct iavf_ring *rx_ring) { + struct device *dev = rx_ring->pool->p.dev; + iavf_clean_rx_ring(rx_ring); kfree(rx_ring->rx_pages); rx_ring->rx_pages = NULL; + page_pool_destroy(rx_ring->pool); + rx_ring->dev = dev; + if (rx_ring->desc) { dma_free_coherent(rx_ring->dev, rx_ring->size, rx_ring->desc, rx_ring->dma);Not a fan of this switching back and forth between being a page pool pointer and a dev pointer. Seems problematic as it is easily misinterpreted. I would say that at a minimum stick to either it is page_pool(Rx) or dev(Tx) on a ring type basis.The problem is that page_pool has lifetime from ifup to ifdown, while its ring lives longer. So I had to do something with this, but also I didn't want to have 2 pointers at the same time since it's redundant and +8 bytes to the ring for nothing.
It might be better to just go with NULL rather than populating it w/ two different possible values. Then at least you know if it is an rx_ring it is a page_pool and if it is a tx_ring it is dev. You can reset to the page pool when you repopulate the rest of the ring.
quoted
This setup works for iavf, however for i40e/ice you may run into issues since the setup_rx_descriptors call is also used to setup the ethtool loopback test w/o a napi struct as I recall so there may not be a q_vector.I'll handle that. Somehow :D Thanks for noticing, I'll take a look whether I should do something right now or it can be done later when switching the actual mentioned drivers. [...]quoted
quoted
@@ -240,7 +237,10 @@ struct iavf_rx_queue_stats { struct iavf_ring { struct iavf_ring *next; /* pointer to next ring in q_vector */ void *desc; /* Descriptor ring memory */ - struct device *dev; /* Used for DMA mapping */ + union { + struct page_pool *pool; /* Used for Rx page management */ + struct device *dev; /* Used for DMA mapping on Tx */ + }; struct net_device *netdev; /* netdev ring maps to */ union { struct iavf_tx_buffer *tx_bi;Would it make more sense to have the page pool in the q_vector rather than the ring? Essentially the page pool is associated per napi instance so it seems like it would make more sense to store it with the napi struct rather than potentially have multiple instances per napi.As per Page Pool design, you should have it per ring. Plus you have rxq_info (XDP-related structure), which is also per-ring and participates in recycling in some cases. So I wouldn't complicate. I went down the chain and haven't found any place where having more than 1 PP per NAPI would break anything. If I got it correctly, Jakub's optimization discourages having 1 PP per several NAPIs (or scheduling one NAPI on different CPUs), but not the other way around. The goal was to exclude concurrent access to one PP from different threads, and here it's impossible.
The xdp_rxq can be mapped many:1 to the page pool if I am not mistaken. The only reason why I am a fan of trying to keep the page_pool tightly associated with the napi instance is because the napi instance is what essentially is guaranteeing the page_pool is consistent as it is only accessed by that one napi instance.
Lemme know. I can always disable NAPI optimization for cases when one vector is shared by several queues -- and it's not a usual case for these NICs anyway -- but I haven't found a reason for that.
I suppose we should be fine if we have a many to one mapping though I suppose. As you said the issue would be if multiple NAPI were accessing the same page pool.