Re: devlink interface for asynchronous event/messages from firmware?
From: Jacob Keller <jacob.e.keller@intel.com>
Date: 2020-05-21 20:59:34
On 5/21/2020 1:52 PM, Ido Schimmel wrote:
On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote:quoted
On 5/20/2020 5:16 PM, Jakub Kicinski wrote:quoted
On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:quoted
Hi Jiri, Jakub, I've been asked to investigate using devlink as a mechanism for reporting asynchronous events/messages from firmware including diagnostic messages, etc. Essentially, the ice firmware can report various status or diagnostic messages which are useful for debugging internal behavior. We want to be able to get these messages (and relevant data associated with them) in a format beyond just "dump it to the dmesg buffer and recover it later". It seems like this would be an appropriate use of devlink. I thought maybe this would work with devlink health: i.e. we create a devlink health reporter, and then when firmware sends a message, we use devlink_health_report. But when I dug into this, it doesn't seem like a natural fit. The health reporters expect to see an "error" state, and don't seem to really fit the notion of "log a message from firmware" notion. One of the issues is that the health reporter only keeps one dump, when what we really want is a way to have a monitoring application get the dump and then store its contents. Thoughts on what might make sense for this? It feels like a stretch of the health interface... I mean basically what I am thinking of having is using the devlink_fmsg interface to just send a netlink message that then gets sent over the devlink monitor socket and gets dumped immediately.Why does user space need a raw firmware interface in the first place? Examples?So the ice firmware can optionally send diagnostic debug messages via its control queue. The current solutions we've used internally essentially hex-dump the binary contents to the kernel log, and then these get scraped and converted into a useful format for human consumption. I'm not 100% of the format, but I know it's based on a decoding file that is specific to a given firmware image, and thus attempting to tie this into the driver is problematic.You explained how it works, but not why it's needed :)
Well, the reason we want it is to be able to read the debug/diagnostics data in order to debug issues that might be related to firmware or software mis-use of firmware interfaces. By having it be a separate interface rather than trying to scrape from the kernel message buffer, it becomes something we can have as a possibility for debugging in the field.
quoted
There is also a plan to provide a simpler interface for some of the diagnostic messages where a simple bijection between one code to one message for a handful of events, like if the link engine can detect a known reason why it wasn't able to get link. I suppose these could be translated and immediately printed by the driver without a special interface.Petr worked on something similar last year: https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/ (local) Amit is currently working on a new version based on ethtool (netlink).
I'll take a look, thanks! -Jake