Re: [PATCH net-next 0/9] devlink: Add support for region access
From: Alex Vesker <hidden>
Date: 2018-03-31 06:11:35
On 3/31/2018 1:26 AM, David Ahern wrote:
On 3/30/18 1:39 PM, Alex Vesker wrote:quoted
On 3/30/2018 7:57 PM, David Ahern wrote:quoted
On 3/30/18 8:34 AM, Andrew Lunn wrote:quoted
quoted
quoted
And it seems to want contiguous pages. How well does that work after the system has been running for a while and memory is fragmented?The allocation can be changed, there is no read need for contiguous pages. It is important to note that we the amount of snapshots is limited by the driver this can be based on the dump size or expected frequency of collection. I also prefer not to pre-allocate this memory.The driver code also asks for a 1MB contiguous chunk of memory! You really should think about this API, how can you avoid double memory allocations. And can kvmalloc be used. But then you get into the problem for DMA'ing the memory from the device... This API also does not scale. 1MB is actually quite small. I'm sure there is firmware running on CPUs with a lot more than 1MB of RAM. How well does with API work with 64MB? Say i wanted to snapshot my GPU? Or the MC/BMC?That and the drivers control the number of snapshots. The user should be able to control the number of snapshots, and an option to remove all snapshots to free up that memory.There is an option to free up this memory, using a delete command. The reason I added the option to control the number of snapshots from the driver side only is because the driver knows the size of the snapshots and when/why they will be taken. For example in our mlx4 driver the snapshots are taken on rare failures, the snapshot is quite large and from past analyses the first dump is usually the important one, this means that 8 is more than enough in my case. If a user wants more than that he can always monitor notification read the snapshot and delete once backup-ed, there is no reason for keeping all of this data in the kernel.I was thinking less. ie., a user says keep only 1 or 2 snapshots or disable snapshots altogether.
Devlink configuration is not persistent if the driver is reloaded, currently there is no way to sync this. One or two might not be enough time to read, delete and make room for the next one, as I said each driver should do its calculations here based on frequency, size and even the time it takes capturing it. The user can't know if one snapshot is enough for debug I saw cases in which debug requires more than one snapshot to make sure a health clock is incremented and the FW is alive. I want to be able to login to a customer and accessing this snapshot without any previous configuration from the user and not asking for enabling the feature and then waiting for a repro...this will help debugging issues that are hard to reproduce, I don't see any reason to disable this.