Thread (37 messages) 37 messages, 5 authors, 2025-07-26

Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event

From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: 2025-06-02 06:30:10
Also in: linux-edac, linux-pci, lkml

On Mon, 12 May 2025, Shuai Xue wrote:
quoted hunk ↗ jump to hunk
Hotplug events are critical indicators for analyzing hardware health,
particularly in AI supercomputers where surprise link downs can
significantly impact system performance and reliability.

To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
tracepoint for hotplug event to help healthy check, and generate
tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.

The output like below:

$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
    <...>-206     [001] .....    40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down

    <...>-206     [001] .....    40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present

Suggested-by: Lukas Wunner <lukas@wunner.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
changes since v7:
- replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
- pick up Reviewed-by from Lukas Wunner
---
 drivers/pci/hotplug/Makefile      |  3 ++
 drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
 drivers/pci/hotplug/trace.h       | 68 +++++++++++++++++++++++++++++++
 include/uapi/linux/pci.h          |  7 ++++
 4 files changed, 105 insertions(+), 6 deletions(-)
 create mode 100644 drivers/pci/hotplug/trace.h
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 40aaf31fe338..a1a9d1e98962 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -3,6 +3,9 @@
 # Makefile for the Linux kernel pci hotplug controller drivers.
 #
 
+# define_trace.h needs to know how to find our header
+CFLAGS_pciehp_ctrl.o				:= -I$(src)
+
 obj-$(CONFIG_HOTPLUG_PCI)		+= pci_hotplug.o
 obj-$(CONFIG_HOTPLUG_PCI_COMPAQ)	+= cpqphp.o
 obj-$(CONFIG_HOTPLUG_PCI_IBM)		+= ibmphp.o
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index d603a7aa7483..f9beb4d3a9b8 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -23,6 +23,9 @@
 #include "../pci.h"
 #include "pciehp.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
Hi,

Instead of spreading tracepoint creating into subdriver code like this, 
should we place it into one place, e.g., drivers/pci/pci-trace.c (which is 
what I seem to have used in my yet to be submitted patch that adds 
tracepoints into bwctrl link speed events)?
quoted hunk ↗ jump to hunk
diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
new file mode 100644
index 000000000000..21329c198019
--- /dev/null
+++ b/drivers/pci/hotplug/trace.h
Perhaps in include/trace/events/pci.h (or 
include/trace/events/pci-hotplug.h)?

I don't know what is the general rule having them inside drivers/ vs 
include/trace/events, Documentation/trace/tracepoints.rst only mentions 
the latter, but there seems to be plenty under drivers/ too.
quoted hunk ↗ jump to hunk
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_PCI_HP_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pci
+
+#define PCI_HOTPLUG_EVENT					\
-- 
 i.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help