Thread (41 messages) 41 messages, 9 authors, 2021-05-11

Re: [PATCH net-next v3 2/5] mm: add a signature in struct page

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: 2021-04-19 15:43:30
Also in: bpf, linux-mm, linux-rdma, lkml

Hi Shakeel,
On Mon, Apr 19, 2021 at 07:57:03AM -0700, Shakeel Butt wrote:
On Sun, Apr 18, 2021 at 10:12 PM Ilias Apalodimas
[off-list ref] wrote:
quoted
On Wed, Apr 14, 2021 at 01:09:47PM -0700, Shakeel Butt wrote:
quoted
On Wed, Apr 14, 2021 at 12:42 PM Jesper Dangaard Brouer
[off-list ref] wrote:
quoted
[...]
quoted
quoted
quoted
Can this page_pool be used for TCP RX zerocopy? If yes then PageType
can not be used.
Yes it can, since it's going to be used as your default allocator for
payloads, which might end up on an SKB.
I'm not sure we want or should "allow" page_pool be used for TCP RX
zerocopy.
For several reasons.

(1) This implies mapping these pages page to userspace, which AFAIK
means using page->mapping and page->index members (right?).
No, only page->_mapcount is used.
I am not sure I like leaving out TCP RX zerocopy. Since we want driver to
adopt the recycling mechanism we should try preserving the current
functionality of the network stack.

The question is how does it work with the current drivers that already have an
internal page recycling mechanism.
I think the current drivers check page_ref_count(page) to decide to
reuse (or not) the already allocated pages.

Some examples from the drivers:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:ixgbe_can_reuse_rx_page()
drivers/net/ethernet/intel/igb/igb_main.c:igb_can_reuse_rx_page()
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:mlx5e_rx_cache_get()
Yes, that's how internal recycling is done in drivers. As Jesper mentioned the
refcnt of the page is 1 for the page_pool owned pages and that's how we decide 
what to do with the page.
quoted
quoted
quoted
(2) It feels wrong (security wise) to keep the DMA-mapping (for the
device) and also map this page into userspace.
I think this is already the case i.e pages still DMA-mapped and also
mapped into userspace.
quoted
(3) The page_pool is optimized for refcnt==1 case, and AFAIK TCP-RX
zerocopy will bump the refcnt, which means the page_pool will not
recycle the page when it see the elevated refcnt (it will instead
release its DMA-mapping).
Yes this is right but the userspace might have already consumed and
unmapped the page before the driver considers to recycle the page.
Same question here. I'll have a closer look in a few days and make sure we are
not breaking anything wrt zerocopy.
Pages mapped into the userspace have their refcnt elevated, so the
page_ref_count() check by the drivers indicates to not reuse such
pages.
When tcp_zerocopy_receive() is invoked it will call tcp_zerocopy_vm_insert_batch() 
which will end up doing a get_page().
What you are saying is that once the zerocopy is done though, skb_release_data() 
won't be called, but instead put_page() will be? If that's the case then we are 
indeed leaking DMA mappings and memory. That sounds weird though, since the
refcnt will be one in that case (zerocopy will do +1/-1 once it's done), so who
eventually frees the page? 
If kfree_skb() (or any wrapper that calls skb_release_data()) is called 
eventually, we'll end up properly recycling the page into our pool.
quoted
quoted
quoted
(4) I remember vaguely that this code path for (TCP RX zerocopy) uses
page->private for tricks.  And our patch [3/5] use page->private for
storing xdp_mem_info.

IMHO when the SKB travel into this TCP RX zerocopy code path, we should
call page_pool_release_page() to release its DMA-mapping.
I will let TCP RX zerocopy experts respond to this but from my high
level code inspection, I didn't see page->private usage.
Shakeel are you aware of any 'easy' way I can have rx zerocopy running?
I would recommend tools/testing/selftests/net/tcp_mmap.c.
Ok, thanks I'll have a look.

Cheers
/Ilias
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help