Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
From: "Chen, Yu C" <yu.c.chen@intel.com>
Date: 2026-05-26 04:08:29
Also in:
lkml
Hi Venkat, On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
* Chen, Yu C [off-list ref] [2026-05-25 23:35:45]:quoted
Hi Venkat, On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:quoted
Greetings!!! I am seeing an early boot kernel panic due to NULL pointer dereference on a POWER9 (pSeries) system when testing linux-next (next-20260522).It seems that cpumask_first(llc_mask(i)) is accessing NULL cpu_coregroup_mask():quoted
has_coregroup_support() is false, thus cpu_coregroup_map is never allocated in smp_prepare_cpus(). This machine is a "shared system" VM. We should probably let the LLC id generation fall back to using L2 id if cpu_coregroup_mask is unavailable (which restores the behavior before this patch). I'm wondering if the following change would help(need IBM friends' help on this):Power9 and below systems, dont have coregroup. Its not because of shared LPAR. But its true for dedicated LPARs too. Only Power10 and above systems have hemisphere where we add MC/coregroup support.
OK, thanks for the correction. Are you saying coregroup_enabled is false on Power9 and older hardware, and set to true on Power10? Power10 has a corresponding device-tree property, which is parsed to enable hemisphere support in find_possible_nodes(). This is why has_coregroup_support() returns true for Power10.
quoted
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 3467f86fd78f..cf6c2e4190ab 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c@@ -1042,11 +1042,6 @@ static const struct cpumask*tl_smallcore_smt_mask(struct sched_domain_topology_ } #endif -struct cpumask *cpu_coregroup_mask(int cpu) -{ - return per_cpu(cpu_coregroup_map, cpu); -} - static bool has_coregroup_support(void) { /* Coregroup identification not available on shared systems */@@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void) return coregroup_enabled; } +struct cpumask *cpu_coregroup_mask(int cpu) +{ + if (!has_coregroup_support()) + return cpu_l2_cache_mask(cpu); + + return per_cpu(cpu_coregroup_map, cpu); +} +While this is a work-around for the problem in Power9 It will hurt Power10 and Power11 systems. As has been alluded by Prateek, MC is not LLC on Power.
Could you please elaborate on the cache topology? Specifically, could you clarify what the LLC is for Power9 and Power10 respectively? Is it always the L2 cache? I have checked the IBM documentation available at: https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf According to the document, a hemisphere corresponds to a 64MB L3 cache shared by 8 cores. Since the MC domain spans a single hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled for the MC domain?
So by using llc_mask as cpu_coregroup_mask() we run the trouble of assuming MC to be similar to LLC. So it will impact Power 10/11 Systems. In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define #define llc_mask(cpu) cpu_coregroup_mask(cpu) defining it llc_mask to cpu_coregroup_mask means MC should be LLC. This is not true for some architectures atleast on Power.
OK.
So shouldn't it be using #define llc_mask(cpu) per_cpu(sd_llc, cpu) This should work for systems where LLC is sub-coregroup, coregroup (or super coregroup: Lets say some archs want LLC at PKG and cluster at coregroup). if we do that, I dont think we even need the else case where we say #define llc_mask(cpu) cpumask_of(cpu)
I suppose you are referring to sched_domain_span(per_cpu(sd_llc, cpu)). Indeed, deriving the LLC from the SD_SHARE_LLC level offers better scalability. However, this approach would involve scheduler domains, which can be truncated by cpuset partitions - a scenario we prefer to avoid. thanks, Chenyu