Thread (28 messages) 28 messages, 6 authors, 2019-02-11

Re: [PATCH for-next 4/4] devlink: add health command support

From: Jiri Pirko <jiri@resnulli.us>
Date: 2019-02-11 10:50:44

Sun, Feb 10, 2019 at 07:28:49PM CET, ayal@mellanox.com wrote:
This patch adds support for the following commands:
devlink health show      [DEV reporter REPORTE_NAME]
devlink health recover    DEV reporter REPORTER_NAME
devlink health diagnose   DEV reporter REPORTER_NAME
devlink health dump show  DEV reporter REPORTER_NAME
devlink health dump clear DEV reporter REPORTER_NAME
devlink health set        DEV reporter REPORTER_NAME NAME VALUE

* show: Devlink health show command displays status and configuration info on
  specific reporter on a device or dump the info on all reporters on all
  devices.
* recover: Devlink health recover enables the user to initiate a
  recovery on a reporter. This operation will increment the recoveries
  counter displayed in the show command.
* diagnose: Devlink health diagnose enables the user to retrieve diagnostics data
  on a reporter on a device. The command's output is a free text defined
  by the reporter.
* dump show: Devlink health dump show displays the last saved dump. Devlink
  health saves a single dump. If a dump is not already stored by
  the Devlink for this reporter, Devlink generates a new dump. The
  dump can be generated automatically when a reporter reports on an
  error or manually by user's request.
  dump output is defined by the reporter.
* dump clear: Devlink health dump clear, deletes the last saved dump file.
* set: Devlink health set, enables the user to configure:
1) grace_period [msec] time interval between auto recoveries.
2) auto_recover [true/false] whether the devlink should execute
automatic recover on error.

Examples:
$devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
name tx
 state healthy #err 0 #recover 1 last_dump_ts N/A
   parameters:
     grace period 600 auto_recover true
$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
 sqn: 4283 HW state: 1 stopped: false
 sqn: 4288 HW state: 1 stopped: false
 sqn: 4293 HW state: 1 stopped: false
 sqn: 4298 HW state: 1 stopped: false
 sqn: 4303 HW state: 1 stopped: false
$devlink health dump show pci/0000:00:09.0 reporter tx
TX dump data
$devlink health dump clear pci/0000:00:09.0 reporter tx
$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <redacted>
Reviewed-by: Moshe Shemesh <redacted>
---
devlink/devlink.c            | 551 ++++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/devlink.h |  23 ++
man/man8/devlink-health.8    | 176 ++++++++++++++
man/man8/devlink.8           |   7 +-
4 files changed, 755 insertions(+), 2 deletions(-)
755 lines is too much for one patch.
For easier review, please split this patch into separate patchset,
preferably per-cmd.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help