Thread (25 messages) 25 messages, 5 authors, 2024-10-30

RE: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info

From: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Date: 2024-10-24 02:21:10
Also in: linux-edac, lkml

From: Avadhut Naik <avadhut.naik@amd.com>
[...]
Subject: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export
vendor specific info

Currently, exporting new additional machine check error information involves
adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future) by
CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some CPUs,
unnecessarily, since those fields are vendor-specific. Moreover, different CPU
vendors may export the additional information in varying sizes.

The problem particularly intensifies since struct mce is exposed to userspace
as part of UAPI. It's bloating through vendor-specific data should be avoided
to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper struct
without impacting the user space API.

[Yazen: Add last commit message paragraph.]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-
yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-
yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Move wrapper changes required in mce_read_aux() and
mce_no_way_out() to this patch from the second patch.
2. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on of tip/master to avoid merge conflicts.
2. Resolve kernel test robot's warning on the use of memset() in
do_machine_check().

Changes in v5:
1. No changes except rebasing on top of tip/master.

Changes in v6:
1. Rebase on top of tip/master.
2. Introduce to_mce_hw_err macro to eliminate changes required in notifier
chain callback functions, especially callback functions of EDAC drivers.
3. Change third parameter of __mc_scan_banks() to a pointer to the new
wrapper structure and make the required changes accordingly.

Changes in v7:
1. Rebase on top of tip/master.
2. Fix initialization of struct mce_hw_err *final in do_machine_check().
As my comments resolved in v6 and v7,

    Reviewed-by: Qiuxu Zhuo [off-list ref]

-Qiuxu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help