Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
From: Naveen N. Rao <hidden>
Date: 2013-08-13 17:18:01
Also in:
linux-pci, lkml
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:11:18 +0530 "Naveen N. Rao" [off-list ref] escreveu:quoted
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:quoted
quoted
But, this only seems to expose the APEI data as a string and doesn't look to really make all the fields available to user-space in a raw manner. Not sure how well this can be utilised by a user-space tool. Do you have suggestions on how we can do this?There's already an userspace tool that handes it: https://git.fedorahosted.org/cgit/rasdaemon.git/ What is missing there on the current version is the bits that would allow to translate from APEI way to report an error (memory node, card, module, bank, device) into a DIMM label[1].If I'm reading this right, all APEI data seems to be squashed into a string in mc_event.Yes. We had lots of discussion about how to map memory errors over the last couple years. Basically, it was decided that the information that could be decoded into a DIMM to be mapped as integers, and all other driver-specific data to be added as strings. On the tests I did, different machines/vendors fill the APEI data on a different way, with makes harder to associate them to a DIMM.
Ok, so it looks like ghes_edac isn't quite useful yet. In the meantime, like Boris suggests, I think we can have a different trace event for raw APEI reports - userspace can use it as it pleases. Once ghes_edac gets better, users can decide whether they want raw APEI reports or the EDAC-processed version and choose one or the other trace event. Regards, Naveen