Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in... | netdev

Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel

From: Jiri Pirko <jiri@resnulli.us>
Date: 2018-04-03 07:04:15
Also in: kexec, linux-fsdevel, lkml

Possibly related (same subject, not in this thread)

2018-04-03 · Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel · Andrew Lunn <andrew@lunn.ch>
2018-04-03 · Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel · Alex Vesker <hidden>
2018-04-02 · Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel · Jiri Pirko <jiri@resnulli.us>
2018-03-30 · Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel · Andrew Lunn <andrew@lunn.ch>
2018-03-30 · Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel · Rahul Lakkireddy <hidden>

Mon, Apr 02, 2018 at 02:30:45PM CEST, rahul.lakkireddy@chelsio.com wrote:

On Monday, April 04/02/18, 2018 at 14:41:43 +0530, Jiri Pirko wrote:

quoted

Fri, Mar 30, 2018 at 08:42:00PM CEST, ebiederm@xmission.com wrote:

quoted

Rahul Lakkireddy [off-list ref] writes:

quoted

On Friday, March 03/30/18, 2018 at 16:09:07 +0530, Jiri Pirko wrote:

quoted

Sat, Mar 24, 2018 at 11:56:33AM CET, rahul.lakkireddy@chelsio.com wrote:

quoted

Add a new module crashdd that exports the /sys/kernel/crashdd/
directory in second kernel, containing collected hardware/firmware
dumps.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /sys/kernel/crashdd/ directory are
as follows:

1. During probe (before hardware is initialized), device drivers
register to the crashdd module (via crashdd_add_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. Crashdd creates a driver's directory under
/sys/kernel/crashdd/<driver>. Then, it allocates the buffer with

This smells. I need to identify the exact ASIC instance that produced
the dump. To identify by driver name does not help me if I have multiple
instances of the same driver. This looks wrong to me. This looks like
a job for devlink where you have 1 devlink instance per 1 ASIC instance.

Please see:
http://patchwork.ozlabs.org/project/netdev/list/?series=36524

I bevieve that the solution in the patchset could be used for
your usecase too.

The sysfs approach proposed here had been dropped in favour exporting
the dumps as ELF notes in /proc/vmcore.

Will be posting the new patches soon.

The concern was actually how you identify which device that came from.
Where you read the identifier changes but sysfs or /proc/vmcore the
change remains valid.

Yeah. I still don't see how you link the dump and the device.

In our case, the dump and the device are being identified by the
driver’s name followed by its corresponding pci bus id.  I’ve posted an
example in my v3 series:

https://www.spinics.net/lists/netdev/msg493781.html

Here’s an extract from the link above:

# readelf -n /proc/vmcore

Displaying notes found at file offset 0x00001000 with length 0x04003288:
Owner                 Data size     Description
VMCOREDD_cxgb4_0000:02:00.4 0x02000fd8      Unknown note type:(0x00000700)
VMCOREDD_cxgb4_0000:04:00.4 0x02000fd8      Unknown note type:(0x00000700)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
CORE                 0x00000150     NT_PRSTATUS (prstatus structure)
VMCOREINFO           0x0000074f     Unknown note type: (0x00000000)

Here, for my two devices, the dump’s names are
VMCOREDD_cxgb4_0000:02:00.4 and VMCOREDD_cxgb4_0000:04:00.4.

It’s really up to the callers to write their own unique name for the
dump.  The name is appended to “VMCOREDD_” string.

quoted

Rahul, did you look at the patchset I pointed out?

For devlink, I think the dump name would be identified by
bus_type/device_name; i.e. “pci/0000:02:00.4” for my example.
Is my understanding correct?

Yes.

Thanks,
Rahul

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help