Re: [PATCH v5 1/3] powerpc: make fadump resilient with memory add/remove events
From: Aneesh Kumar K V <aneesh.kumar@kernel.org>
Date: 2023-11-22 12:21:33
On 11/22/23 4:05 PM, Sourabh Jain wrote:
Hello Michael, On 22/11/23 10:47, Michael Ellerman wrote:quoted
Aneesh Kumar K V [off-list ref] writes: ...quoted
I am not sure whether we need to add all the complexity to enable supporting different fadump kernel version. Is that even a possible use case with fadump? Can't we always assume that with fadump the crash kernel and fadump kernel will be same version?How sure are we of that? Don't we go through grub when we boot into the 2nd kernel. And so couldn't it choose to boot a different kernel, for whatever reason. I don't think we need to support different pt_reg / cpumask sizes, but requiring the exact same kernel version is too strict I think.Agree.quoted
But maybe I'm wrong. Would be good to hear what distro folks think.How about checking fadump crash info header compatibility in the following way? static bool is_fadump_header_compatible(struct fadump_crash_info_header *fdh) { if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { pr_err("Old magic number, can't process the dump."); return false; } if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { pr_err("Fadump header is corrupted."); return false; } /* * If the kernel version of the first/crashed kernel and the second/fadump * kernel is not same, then only collect the dump if the size of all * non-primitive type members of the fadump header is the same across kernels. */ if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { pr_err("Fadump header size mismatch.\n") return false; } else pr_warn("Kernel version mismatch; dump data is unreliable.\n"); }
You also want a fdh->version check? I am still not sure you need the kernel_version check. IMHO that is too strict and can hit that check where you have kexec kernel which is not installed in /boot crashing?
return true;
}
And the new fadump crash info header will be: As suggested by Hari.
/* fadump crash info structure */
struct fadump_crash_info_header {
u64 magic_number;
+ u32 version;
u32 crashing_cpu;
u64 elfcorehdr_addr;
+ u64 elfcorehdr_size;
+ u64 vmcoreinfo_raddr;
+ u64 vmcoreinfo_size;
+ u8 kernel_version[__NEW_UTS_LEN + 1];
+ u32 pt_regs_sz;
struct pt_regs regs;
+ u32 cpu_mask_sz;
struct cpumask cpu_mask;
};
Thanks,
Sourabh Jain