Thread (8 messages) 8 messages, 2 authors, 2021-01-19

RE: [RFC PATCH 1/2] EDAC/ghes: Add EDAC device for the CPU caches

From: Shiju Jose <hidden>
Date: 2021-01-19 10:26:06
Also in: linux-edac, lkml

Hi Boris,
-----Original Message-----
From: Borislav Petkov [mailto:bp@alien8.de]
Sent: 18 January 2021 18:37
To: Shiju Jose <redacted>
Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
kernel@vger.kernel.org; james.morse@arm.com;
mchehab+huawei@kernel.org; tony.luck@intel.com; rjw@rjwysocki.net;
lenb@kernel.org; rrichter@marvell.com; Jonathan Cameron
[off-list ref]; tanxiaofei [off-list ref];
linuxarm@openeuler.org
Subject: Re: [RFC PATCH 1/2] EDAC/ghes: Add EDAC device for the CPU
caches

On Fri, Jan 15, 2021 at 11:06:30AM +0000, Shiju Jose wrote:
quoted
L2 cache corrected errors are detected occasionally on few of our
ARM64 hardware boards. Though it is rare, the probability of the CPU
cache errors frequently occurring can't be avoided.
The earlier failure detection by monitoring the cache corrected errors
for the frequent occurrences and taking preventive action could
prevent more serious hardware faults.

On Intel architectures, cache corrected errors are reported and the
affected cores are offline in the architecture specific method.
http://www.mcelog.org/cache.html

However for the firmware-first error reporting, specifically on
ARM64 architectures, there is no provision present for reporting the
cache corrected error count to the user-space and taking preventive
action such as offline the affected cores.
How hard was it to write that in your first submission? What do you think
would be the best way to persuade a patch reviewer/maintainer to take a
look at your submission?
quoted
quoted
Why a separate Kconfig item?
CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this feature
optional only for the platforms which need this and supported.
quoted
quoted
+	depends on EDAC_GHES
depends on EDAC_GHES hardly expresses which platforms need it/support it.

If anything, depends on ARM64.
Sure. I will add dependency on ARM64.
This EDAC code for the cache errors is  architecture independent for the
firmware-first error reporting and  could be used for other architectures,
though now we need it for the ARM64. 
quoted
quoted
Init stuff belongs into ghes_scan_system().
Did you mean calling  ghes_edac_create_cpu_device() in the
ghes_scan_system()?

I mean, all hardware discovery needs to happen in ghes_scan_system
- you don't need to call those from outside the driver, in
ghes_edac_register().
sure. Will modify.
--
Regards/Gruss,
   Boris.
Thanks,
Shiju
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help