Thread (31 messages) 31 messages, 4 authors, 2019-01-04

Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system

From: Jakub Kicinski <hidden>
Date: 2019-01-02 22:46:35

On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote:
On 1/1/2019 3:47 AM, Jakub Kicinski wrote:
quoted
The addition of "objdump" and its marshalling is a bit disappointing.
It seemed to me when region snapshots were added that they would serve
this exact purpose.  Taking a region snapshot or "core dump", if you
will, after something went south.  Now it's confusing to have two
mechanisms serving the same purpose.  
The motivation here was that the driver can provide reporters to its 
sub-modules, such that each reporter will be able to provide all needed 
info and recover methods to face run time errors.

The implementation of the objdump function is in the hands of the 
reporter developer, and he can dump whatever he thinks it is needed.
Keep in mind that a driver can have many reporters (TX, RX, FW, command 
interface, etc). For most of the reporters, there is important 
information in the driver which can not be fetched with region snapshot 
(intended for memory fetching only).
For SW info, driver shall build the info and send it interpreted (unlike 
all dumps / region available mechanisms)
I have in plans to extend the TX reporter to have objdump method in 
which I will pass many SQ SW attributes that can be very handy in a 
debug session.
My feeling is that instead of duplicating this infrastructure we should
try to grow region snapshots beyond the "HW memory dumper".  In a
debugging session you may want to have dumps as well as read the state
live.  Region snapshot API was built with this promise in mind.  The
information in reporter dump is not easily available other than when
the dump happens, which is not great in a debugging session.  Driver
will have to expose it via debugfs/region dump/ethtool dump or some
such for live debug.
quoted
It's unclear from quick reading of the code how if the buffers get
timestamped.  Can you report more than one?  
The timestamp which devlink health reports on, is the timestamp in which 
it got the buffers filled by the driver. Every dump/diagnose has one ts.
Per reporter, it is possible to store up to one dump. only clear command 
can remove it and makes the reporter ready to fetch a new objdump.
Region snapshots support collecting multiple snapshots IIRC, no?
quoted
About the marshalling, I'm not sure it belongs in the kernel.  There is
precedent for adding interpretation of FW blobs in user space (ethtool).
IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
drivers.  Amount of code you add to print the simple example from last
patch is not inspiring confidence.  
The idea was to provide the developer the ability to create "tree-like" 
of information, it is needed when you want to describe complex objects. 
This caused a longer coding, but I don't think we are even close to the 
scale you are talking about.
We can remove the tree flexibility, and move to array format, it will 
make the code of storing data by the driver to be shorter, but we will 
lose the ability to have it in tree-like format.
To be clear I slightly oppose the marshalling in the first place.  It
may be better to just dump the data as is, and have user space know
what the interpretation is.
And again, regarding ethtool, it is a tool for dumping HW/FW, this could 
have been an argument for the region snapshot, not for the devlink health...
I've seen drivers dumping ring state and other SW info via ethtool.
All debugging APIs end up "mixed source" in my experience.
quoted
And on the bike shedding side :) -> I think you should steer clear of
calling this objdump, as it has very little to do with the
functionality of well-known objdump tool.  Its not even clear what the
object the name is referring to is.  
Let's agree on concept, we can change name to dump. Reporter->dump is 
very clear when you know what the reporter is.
Thanks!
quoted
Long story short the overlap with region snapshots makes it unclear
what purpose either serves, and IMHO we should avoid the marshalling by
teaching user space how to interpret snapshots.  Preferably we only
have one dump mechanism, and user space can be taught the interpretation
once.  
Forcing SW reporters to use region snapshot mechanism sounds like a bad 
idea.
I'm not super excited about reusing region API for SW info.  But I like
it more than having multitude of debug APIs for drivers to implement
with largely overlapping functionality and semantics.
To summarize, In order to have the health system working properly, it 
must have a way to objdump/dump itself and provide it to the developer 
for debug. Removing the objdump part will make it useless for run-time 
debug.

I think this is a powerful tool and we shall not ask the user level 
scripts to fetch info from many other partial tools. It shall all be 
focused in one place (recover, diagnose, objdump, statistics).
I don't think having reporter API refer to snapshot IDs is "many other
partial tools" if that's the suggestion.  Seems like you want to focus
all the reporter stuff in one place, and I want to focus the debug APIs
a little :)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help