Thread (63 messages) 63 messages, 13 authors, 2023-07-17

Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

From: David Ahern <dsahern@kernel.org>
Date: 2023-07-14 15:18:56
Also in: linux-mediatek, linux-rdma, linux-wireless, lkml, netdev

On 7/14/23 8:55 AM, Mina Almasry wrote:
I guess the remaining option not fully explored is the idea of getting
the networking stack to consume the scatterlist that
dma_buf_map_attachment() provides for the device memory. The very
rough approach I have in mind (for the RX path) is:

1. Some uapi that binds a dmabuf to an RX queue. It will do a
dma_buf_map_attachment() and get the sg table.

2. We need to feed the scratterlist entries to some allocator that
will chunk it up into pieces that can be allocated by the NIC for
incoming traffic. I'm thinking genalloc may work for this as-is, but I
may need to add one or use something else if I run into some issue.

3. We can implement a memory_provider that allocates these chunks and
wraps them in a struct new_abstraction (as David called it) and feeds
those into the page pool.

4. The page pool would need to be able to process these struct
new_abstraction alongside the struct pages it normally gets from
providers. This is maybe the most complicated part, but looking at the
page pool code it doesn't seem that big of a hurdle (but I have not
tried a POC yet).

5. The drivers (I looked at mlx5) seem to avoid making any mm calls on
the struct pages returned by the pool; the pool abstracts everything
already. The changes to the drivers may be minimal..?

6. We would need to add a new helper, skb_add_rx_new_abstraction_frag
that creates a frag out of new_abstraction rather than a struct page.

Once the skb frags with struct new_abstraction are in the TCP stack,
they will need some special handling in code accessing the frags. But
my RFC already addressed that somewhat because the frags were
inaccessible in that case. In this case the frags will be both
inaccessible and will not be struct pages at all (things like
get_page() will not work), so more special handling will be required,
maybe.

I imagine the TX path would be considerably less complicated because
the allocator and page pool are not involved (I think).

Anyone see any glaring issues with this approach?
Moving skb_frags to an alternative scheme is essential to make this
work. The current page scheme to go from user virtual to pages to
physical is not needed for the dmabuf use case.

For the driver and hardware queue: don't you need a dedicated queue for
the flow(s) in question? If not, how can you properly handle the
teardown case (e.g., app crashes and you need to ensure all references
to GPU memory are removed from NIC descriptors)? If you agree on this
point, then you can require the dedicated queue management in the driver
to use and expect only the alternative frag addressing scheme. ie., it
knows the address is not struct page (validates by checking skb flag or
frag flag or address magic), but a reference to say a page_pool entry
(if you are using page_pool for management of the dmabuf slices) which
contains the metadata needed for the use case.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help