Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)
From: David Ahern <dsahern@kernel.org>
Date: 2023-07-17 03:08:25
Also in:
linux-arm-kernel, linux-mediatek, linux-rdma, linux-wireless, lkml
On 7/16/23 8:05 PM, Mina Almasry wrote:
quoted
For the driver and hardware queue: don't you need a dedicated queue for the flow(s) in question?In the RFC and the implementation I'm thinking of, the queue is 'dedicated' in that each queue will be a devmem TCP queue or a regular queue. devmem queues generate devmem skbs and non-devmem queues generate non-devmem skbs. We support switching queues between devmem mode and non-devmem mode via a uapi.
ethtool APIs or something else?
quoted
If not, how can you properly handle the teardown case (e.g., app crashes and you need to ensure all references to GPU memory are removed from NIC descriptors)?Jason and Christian will correct me if I'm wrong, but AFAICT the dma-buf API requires the dma-buf provider to keep the attachment mapping alive as long as the importer requires it. The dma-buf API gives the importer dma_buf_map_attachment() and dma_buf_unmap_attachment() APIs, but there is no callback for the exporter to inform the importer that it has to take the mapping away.
Isn't the importer that application that terminated (cleanly or other)? That was my thinking but I guess there are other designs that can cross a single application.
The closest thing I saw was the move_notify() callback, but that is optional. In my mind the way it works is that there will be some uapi that binds a dma-buf to an RX queue, that will create the attachment and the mapping. If the user crashes or closes the dma-buf handle then that will unbind the dma-buf from the RX queue, but the mapping will remain alive (via some refcounting) until all the NIC descriptors are freed and the mapping is not under use anymore. Usually this will happen next driver reset which destroys and recreates rx queues thereby freeing all the NIC descriptors (but could be a new API so that we don't rely on a driver reset).quoted
If you agree on this point, then you can require the dedicated queue management in the driver to use and expect only the alternative frag addressing scheme. ie., it knows the address is not struct page (validates by checking skb flag or frag flag or address magic), but a reference to say a page_pool entry (if you are using page_pool for management of the dmabuf slices) which contains the metadata needed for the use case.Honestly if my understanding above doesn't match what you want, I could implement 'dedicated queues' instead, just let me know what you want at some future iteration. Now, I'm more worried about this memory format issue and I'm working on an RX prototype without struct pages. So far purely technically speaking it seems possible.
My comment was only a suggestion on how to simplify driver changes. ie., a queue is either pages (based on standard page_pool or alloc_pages) or some "special" page_pool (ie., new abstraction) but not mixed. In that case it knows how to handle the overloaded 'address' in skb_frag in a clean manner.