Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: 2025-06-02 06:30:10
Also in:
linux-edac, linux-pci, lkml
On Mon, 12 May 2025, Shuai Xue wrote:
quoted hunk ↗ jump to hunk
Hotplug events are critical indicators for analyzing hardware health, particularly in AI supercomputers where surprise link downs can significantly impact system performance and reliability. To this end, define a new TRACING_SYSTEM named pci, add a generic RAS tracepoint for hotplug event to help healthy check, and generate tracepoints for pcie hotplug event. Add enum pci_hotplug_event in include/uapi/linux/pci.h so applications like rasdaemon can register tracepoint event handlers for it. The output like below: $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable $ cat /sys/kernel/debug/tracing/trace_pipe <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present Suggested-by: Lukas Wunner <lukas@wunner.de> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- changes since v7: - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven - pick up Reviewed-by from Lukas Wunner --- drivers/pci/hotplug/Makefile | 3 ++ drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++--- drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++ include/uapi/linux/pci.h | 7 ++++ 4 files changed, 105 insertions(+), 6 deletions(-) create mode 100644 drivers/pci/hotplug/trace.hdiff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile index 40aaf31fe338..a1a9d1e98962 100644 --- a/drivers/pci/hotplug/Makefile +++ b/drivers/pci/hotplug/Makefile@@ -3,6 +3,9 @@ # Makefile for the Linux kernel pci hotplug controller drivers. # +# define_trace.h needs to know how to find our header +CFLAGS_pciehp_ctrl.o := -I$(src) + obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.odiff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index d603a7aa7483..f9beb4d3a9b8 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c@@ -23,6 +23,9 @@ #include "../pci.h" #include "pciehp.h" +#define CREATE_TRACE_POINTS +#include "trace.h"
Hi, Instead of spreading tracepoint creating into subdriver code like this, should we place it into one place, e.g., drivers/pci/pci-trace.c (which is what I seem to have used in my yet to be submitted patch that adds tracepoints into bwctrl link speed events)?
quoted hunk ↗ jump to hunk
diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h new file mode 100644 index 000000000000..21329c198019 --- /dev/null +++ b/drivers/pci/hotplug/trace.h
Perhaps in include/trace/events/pci.h (or include/trace/events/pci-hotplug.h)? I don't know what is the general rule having them inside drivers/ vs include/trace/events, Documentation/trace/tracepoints.rst only mentions the latter, but there seems to be plenty under drivers/ too.
quoted hunk ↗ jump to hunk
@@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HW_EVENT_PCI_HP_H + +#include <linux/tracepoint.h> + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM pci + +#define PCI_HOTPLUG_EVENT \
-- i.