RE: [PATCH v2 0/2] Update mce_record tracepoint
From: "Luck, Tony" <tony.luck@intel.com>
Date: 2024-01-26 20:49:08
Also in:
linux-edac, lkml
quoted
Is it so very different to add this to a trace record so that rasdaemon can have feature parity with mcelog(8)?I knew you were gonna say that. When someone decides that it is a splendid idea to add more stuff to struct mce then said someone would want it in the tracepoint too. And then we're back to my original question: "And where does it end? Stick full dmesg in the tracepoint too?" Where do you draw the line in the sand and say, no more, especially static, fields bloating the trace record should be added and from then on, you should go collect the info from that box. Something which you're supposed to do anyway.
Every patch that adds new code or data structures adds to the kernel memory footprint. Each should be considered on its merits. The basic question being: "Is the new functionality worth the cost?" Where does it end? It would end if Linus declared: "Linux is now complete. Stop sending patches". I.e. it is never going to end. If somebody posts a patch asking to add the full dmesg to a tracepoint, I'll stand with you to say: "Not only no, but hell no". So for Naik's two patches we have: 1) PPIN Cost = 8 bytes. Benefit: Emdeds a system identifier into the trace record so there can be no ambiguity about which machine generated this error. Also definitively indicates which socket on a multi-socket system. 2) MICROCODE Cost = 4 bytes Benefit: Certainty about the microcode version active on the core at the time the error was detected. RAS = Reliability, Availability, Serviceability These changes fall into the serviceability bucket. They make it easier to diagnose what went wrong. -Tony