Re: [RFC v2 00/23] Dynamic memory allocation for DPDK
From: Burakov, Anatoly <hidden>
Date: 2017-12-22 09:13:08
On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
On Tue, 2017-12-19 at 11:14 +0000, Anatoly Burakov wrote:quoted
quoted
Quick outline of all changes done as part of this patchset: * Malloc heap adjusted to handle holes in address space * Single memseg list replaced by multiple expandable memseg lists * VA space for hugepages is preallocated in advance * Added dynamic alloc/free for pages, happening as needed on malloc/freeSPDK will need some way to register for a notification when pages are allocated or freed. For storage, the number of requests per second is (relative to networking) fairly small (hundreds of thousands per second in a traditional block storage stack, or a few million per second with SPDK). Given that, we can afford to do a dynamic lookup from va to pa/iova on each request in order to greatly simplify our APIs (users can just pass pointers around instead of mbufs). DPDK has a way to lookup the pa from a given va, but it does so by scanning /proc/self/pagemap and is very slow. SPDK instead handles this by implementing a lookup table of va to pa/iova which we populate by scanning through the DPDK memory segments at start up, so the lookup in our table is sufficiently fast for storage use cases. If the list of memory segments changes, we need to know about it in order to update our map.
Hi Benjamin, So, in other words, we need callbacks on alloa/free. What information would SPDK need when receiving this notification? Since we can't really know in advance how many pages we allocate (it may be one, it may be a thousand) and they no longer are guaranteed to be contiguous, would a per-page callback be OK? Alternatively, we could have one callback per operation, but only provide VA and size of allocated memory, while leaving everything else to the user. I do add a virt2memseg() function which would allow you to look up segment physical addresses easier, so you won't have to manually scan memseg lists to get IOVA for a given VA. Thanks for your feedback and suggestions!
Having the map also enables a number of other nice things - for instance we allow users to register memory that wasn't allocated through DPDK and use it for DMA operations. We keep that va to pa/iova mapping in the same map. I appreciate you adding APIs to dynamically register this type of memory with the IOMMU on our behalf. That allows us to eliminate a nasty hack where we were looking up the vfio file descriptor through sysfs in order to send the registration ioctl.quoted
* Added contiguous memory allocation API's for rte_malloc and rte_memzone * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory with VFIO
-- Thanks, Anatoly