Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)
From: Christophe Leroy <hidden>
Date: 2023-09-14 04:55:36
Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
quoted hunk ↗ jump to hunk
Le 12/09/2023 à 17:59, Erhard Furtner a écrit :quoted
printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:169 0 30000000 1400000 2000000 __mmu_mapin_ram:146 0 1400000 __mmu_mapin_ram:155 1400000 __mmu_mapin_ram:146 1400000 30000000 __mmu_mapin_ram:155 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023 kasan_init_region: c0000000 30000000 f8000000 fe000000 kasan_init_region: loop f8000000 fe000000 So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".Ah ok, maybe your CPU only has 4 BATs and they are all used, following change would tell us.diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c index 850783cfa9c7..bd26767edce7 100644 --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c@@ -86,6 +86,7 @@ int __init find_free_bat(void) if (!(bat[1].batu & 3)) return b; } + pr_err("NO FREE BAT (%d)\n", n); return -1; }Or you have 8 BATs in which case it's an alignment problem, you need to increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and CONFIG_DATA_SHIFT_BOOL But regardless of that there is a problem we need to find out, because it should work without BATs. As the BATs allocation fails, it falls back to : phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0, MEMBLOCK_ALLOC_ANYWHERE); if (!phys) return -ENOMEM; } ret = kasan_init_shadow_page_tables(k_start, k_end); if (ret) return ret; for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL); __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0); } flush_tlb_kernel_range(k_start, k_end); memset(kasan_mem_to_shadow(start), 0, k_end - k_start); While the __weak function that you confirmed working is: ret = kasan_init_shadow_page_tables(k_start, k_end); if (ret) return ret; block = memblock_alloc(k_end - k_start, PAGE_SIZE); if (!block) return -ENOMEM; for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); void *va = block + k_cur - k_start; pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL); __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0); } flush_tlb_kernel_range(k_start, k_end); I'm having hard time to understand what's could be wrong at the first place. Could you try following change:diff --git a/arch/powerpc/mm/kasan/book3s_32.cb/arch/powerpc/mm/kasan/book3s_32.c index 9954b7a3b7ae..e04f21908c6a 100644--- a/arch/powerpc/mm/kasan/book3s_32.c +++ b/arch/powerpc/mm/kasan/book3s_32.c@@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size) if (k_nobat < k_end) { phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0, - MEMBLOCK_ALLOC_ANYWHERE); + MEMBLOCK_ALLOC_ACCESSIBLE); if (!phys) return -ENOMEM; }And also that one:diff --git a/arch/powerpc/mm/kasan/init_32.cb/arch/powerpc/mm/kasan/init_32.c index a70828a6d935..bc1c075489f4 100644--- a/arch/powerpc/mm/kasan/init_32.c +++ b/arch/powerpc/mm/kasan/init_32.c@@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,unsigned long k_end, pte_t pte) { unsigned long k_cur; + if (k_start == k_end) + return; + for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); pte_t *ptep = pte_offset_kernel(pmd, k_cur);
I tested the two vmlinux you sent me offlist, they both start without problem on QEMU. Regarding the use of BATs, in fact a shift of 23 is still not enough to get free BATs for KASAN. But at least it allows you to map all linear mem with BATS whereas a shift of 22 would require 9 BATs : With shift 22 you have BATs with size : 4+4+8+16+32+64+128+256+256 With shift 23 you have BATs with size : 8+8+16+32+64+128+256+256 So lets forget that for the moment, allthought you may try with CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs. But lets try to refocus on the real problem. In your last mail you say you tried with all patches. Did it include the two above changes ? If not can you perform the tests with those two changes in addition, first one by one then both together depending on the result ? Many thanks for your help and perseverance Christophe