Thread (18 messages) 18 messages, 5 authors, 2024-01-27

Re: [PATCH v2 0/2] Update mce_record tracepoint

From: Borislav Petkov <bp@alien8.de>
Date: 2024-01-26 18:57:13
Also in: linux-edac, lkml

On Fri, Jan 26, 2024 at 05:10:20PM +0000, Luck, Tony wrote:
12 extra bytes divided by (say) 64GB (a very small server these days, may laptop has that much)
   = 0.00000001746%

We will need 57000 changes like this one before we get to 0.001% :-)
You're forgetting that those 12 bytes repeat per MCE tracepoint logged.
And there's other code which adds more 0.01% here and there, well,
because we can.
But the key there is keeping the details of the source machine attached to
the error record. My first contact with machine check debugging is always
just the raw error record (from mcelog, rasdaemon, or console log).
Yes, that is somewhat sensible reason to have the PPIN together with the
MCE record.
Knowing which microcode version was loaded on a core *at the time of
the error* is critical. 
So is the rest of the debug info you're going to need from that machine.
And yet we're not adding that to the tracepoint.
You've spent enough time with Ashok and Thomas tweaking the Linux
microcode driver to know that going back to the machine the next day
to ask about microcode version has a bunch of ways to get a wrong
answer.
Huh, what does that have to do with this?

IIUC, if someone changes something on the system, whether that is
updating microcode or swapping a harddrive or swapping memory or
whatever, that invalidates the errors reported, pretty much.

You can't put it all in the trace record, you just can't. 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help