Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
From: Thomas Monjalon <hidden>
Date: 2021-06-03 07:26:10
03/06/2021 09:06, Andrew Rybchenko:
On 6/2/21 11:35 PM, Thomas Monjalon wrote:quoted
From: Elena Agostini <redacted> The new library gpudev is for dealing with GPU from a DPDK application in a vendor-agnostic way. As a first step, the features are focused on memory management. A function allows to allocate memory inside the GPU, while another one allows to use main (CPU) memory from the GPU. The infrastructure is prepared to welcome drivers in drivers/gpu/ as the upcoming NVIDIA one, implementing the gpudev API. Other additions planned for next revisions: - C implementation file - guide documentation - unit tests - integration in testpmd to enable Rx/Tx to/from GPU memory. The next step should focus on GPU processing task control. Signed-off-by: Elena Agostini <redacted> Signed-off-by: Thomas Monjalon <redacted>LGTM as an RFC. It is definitely to a patch to apply since implementation is missing. See my notes below.
Yes sorry I forgot the RFC tag when sending. [...]
quoted
+typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr); +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);Not that important but I always prefer to typedef function prototypes w/o pointer and use pointer in the structure below. I.e. typedef int (gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr); It allows to specify that corresponding callback must comply to the prototype and produce build error otherwise (and do not rely on warnings), e.g. static gpu_malloc_t mlx5_gpu_malloc; static int mlx5_gpu_malloc(struct rte_gpu_dev *dev, size_t size, void **ptr) { ... } May be a new library should go this way.
I agree.
quoted
+ +struct rte_gpu_dev { + /* Backing device. */ + struct rte_device *device; + /* GPU info structure. */ + struct rte_gpu_info info; + /* Counter of processes using the device. */ + uint16_t process_cnt; + /* If device is currently used or not. */ + enum rte_gpu_state state; + /* FUNCTION: Allocate memory on the GPU. */ + gpu_malloc_t gpu_malloc; + /* FUNCTION: Allocate memory on the CPU visible from the GPU. */ + gpu_malloc_t gpu_malloc_visible; + /* FUNCTION: Free allocated memory on the GPU. */ + gpu_free_t gpu_free;Don't we need a callback to get dev_info?
Yes it's my miss. [...]
quoted
+__rte_experimental +int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);Hm, I think it is better to have 'struct rte_gpu_info *info'. Why should it allocate and return memory to be freed by caller?
No you're right, I overlooked it. [...]
quoted
+ * Allocate a chunk of memory on the GPU.Looking a below function it is required to clarify here if the memory is visible or invisible to GPU (or both allowed).
This function allocates on the GPU so it is visible by the GPU. I feel I misunderstand your question.
quoted
+ * + * @param gpu_id + * GPU ID to allocate memory. + * @param size + * Number of bytes to allocate.Is behaviour defined if zero size is requested? IMHO, it would be good to define.
OK
quoted
+ * @param ptr + * Pointer to store the address of the allocated memory. + * + * @return + * 0 on success, -1 otherwise.Don't we want to differentiate various errors using negative errno as it is done in many DPDK libraries?
Yes I think so, I was just too much lazy to do it in this RFC.
quoted
+ */ +__rte_experimental +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);May be *malloc() should return a pointer and "negative" values used to report various errnos?
I don't understand what you mean by negative values if it is a pointer. We could return a pointer and use rte_errno.
The problem with the approach that comparison vs NULL will not work in this case and we need special macro or small inline function to check error condition. Returned pointer is definitely more convenient, but above not may result in bugs.
I don't know what is better. [...]
quoted
+ * Deallocate a chunk of memory allocated with rte_gpu_malloc*. + * + * @param gpu_id + * Reference GPU ID. + * @param ptr + * Pointer to the memory area to be deallocated.I think it should be NOP in the case of NULL pointer and it should be documented. If not, it must be documented as well.
OK for NOP.