Thread (37 messages) 37 messages, 5 authors, 2025-07-26

Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event

From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: 2025-07-21 08:55:22
Also in: linux-edac, linux-pci, lkml

Steven or other tracing people, can you check below the 
discussion/question on preferred placement of the tracing 
related code.

On Thu, 17 Jul 2025, Shuai Xue wrote:
在 2025/7/17 06:25, Bjorn Helgaas 写道:
quoted
[+cc Ilpo, Jonathan (should have been included since the patch has his
Reviewed-by)]
Thanks.
quoted
Thanks for the ping; I noticed quite a bit of discussion but didn't
follow it myself, so didn't know it was basically all resolved.
The biggest remaining uncertainty related to the placement of some of the 
tracepoint code (tracepoint creation and the trace header). None of us 
knew the answer so we tried to ask a more informed opinion from others but 
that never resulted us getting any answer.

I've a patch which adds tracing to bwctrl [1] (I'll send it officially 
once the placement discussion concludes), and in it, I ended up adding 
drivers/pci/pci-trace.c as the place where to define the tracepoints and 
include/trace/events/pci.h as the trace header placement, whereas this 
patch added them both into the hp driver code.

So if you (Bjorn) are fine placing them into each individual driver as 
done in this patch and tracing folks don't provide us guidance, we can 
go with that approach.

The placement discussion is in this subleaf of the thread:

https://lore.kernel.org/linux-pci/c0cbdd85-9702-40ab-bc97-b51b02b9fc89@linux.alibaba.com/ (local)


[1] https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/1-a.txt (local)

-- 
 i.
quoted
On Mon, May 12, 2025 at 09:38:39AM +0800, Shuai Xue wrote:
quoted
Hotplug events are critical indicators for analyzing hardware health,
particularly in AI supercomputers where surprise link downs can
significantly impact system performance and reliability.
I dropped the "particularly in AI supercomputers" part because I think
this is relevant in general.
quoted
To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
tracepoint for hotplug event to help healthy check, and generate
tracepoints for pcie hotplug event.
I'm not quite clear on the difference between "add generic RAS
tracepoint for hotplug event" and "generate tracepoints for pcie
hotplug event."  Are these two different things?
The purpose of this patch is to address the lack of tracepoints for PCIe
hotplug events in our production environment. In the initial RFC
version, I defined tracepoints such as "Link Up" and "Link Down"
specifically for PCIe hotplug. Later, Lukas suggested that these
tracepoints could be made more generic so that other PCI hotplug drivers
could also use them.

That’s why, when defining the event, I used a "generic" pci_hotplug_event
instead of a pcie_hotplug_event. If you're interested in more details
about this discussion, please refer to this link[1].

[1]https://erol.kernel.org/linux-pci/git/0/commit/?id=0ffd56f572f25bcd6c2265a1863848a18dce0e29

However, currently only PCIe hotplug is using these tracepoints, which
is why the CREATE_TRACE_POINTS macro is placed in
drivers/pci/hotplug/pciehp_ctrl.c.
quoted
I see the new TRACE_EVENT(pci_hp_event, ...) definition.  Is that what
you mean by the "generic RAS tracepoint"?
Yes.

quoted
And the five new trace_pci_hp_event() calls that use the TRACE_EVENT
are the "tracepoints for PCIe hotplug event"?
Actually, the tracepoints are generic, although right now they are only
used for PCIe hotplug.
quoted
quoted
Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.

The output like below:

$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
     <...>-206     [001] .....    40.373870: pci_hp_event: 0000:00:02.0
slot:10, event:Link Down

     <...>-206     [001] .....    40.374871: pci_hp_event: 0000:00:02.0
slot:10, event:Card not present
quoted
+#define PCI_HOTPLUG_EVENT					¥
+	EM(PCI_HOTPLUG_LINK_UP,			"Link Up")	¥
+	EM(PCI_HOTPLUG_LINK_DOWN,		"Link Down")	¥
+	EM(PCI_HOTPLUG_CARD_PRESENT,		"Card present")	¥
+	EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,	"Card not present")
Running this:

   $ git grep -E "¥<(EM|EMe)¥("

I notice that these new events don't look like the others, which
mostly look like "word" or "event-type" or "VERB object".

I'm OK with this, but just giving you a chance to consider what will
be the least surprise to users and easiest for grep and shell
scripting.
I think this is also common. For example, MF_PAGE_TYPE for
memory_failure_event uses a similar format:

#define MF_PAGE_TYPE ¥
	EM ( MF_MSG_KERNEL, "reserved kernel page" ) ¥
	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )


and aer_uncorrectable_errors for aer_event:

#define aer_uncorrectable_errors				¥
	{PCI_ERR_UNC_UND,	"Undefined"},			¥
	{PCI_ERR_UNC_DLP,	"Data Link Protocol Error"},	¥
	{PCI_ERR_UNC_SURPDN,	"Surprise Down Error"},		¥
	{PCI_ERR_UNC_POISON_TLP,"Poisoned TLP"},
quoted
I also noticed capitalization of "Up" and "Down", but not "present"
and "not present".
Aha, this is a bit tricky:)

The original kernel log messages are not consistent either:

ctrl_info(ctrl, "Slot(%s): Link Down¥n",
ctrl_info(ctrl, "Slot(%s): Card not present¥n",

I tried to keep the output as close as possible to the existing log
messages. If you prefer a more consistent capitalization style, I can
send another patch to fix that.

quoted
"Card" is only used occasionally and informally in the PCIe spec, and
not at all in the context of hotplug of Slot Status (Presence Detect
State refers to "adapter in the slot"), but it does match the pciehp
dmesg text, so it probably makes sense to use that.

Anyway, I applied this on pci/trace for v6.17.  If there's anything
you want to tweak in the commit log or event text, we can still do
that.

   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace

Bjorn
Thank you again for applying this to pci/trace for v6.17. If there’s
anything more to tweak in the commit log or event text, please let me
know.

Best regards,

Shuai
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help