Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)
From: Erhard Furtner <hidden>
Date: 2023-09-14 12:34:56
On Thu, 14 Sep 2023 04:54:17 +0000 Christophe Leroy [off-list ref] wrote:
Le 12/09/2023 à 19:39, Christophe Leroy a écrit :quoted
Le 12/09/2023 à 17:59, Erhard Furtner a écrit :quoted
printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:169 0 30000000 1400000 2000000 __mmu_mapin_ram:146 0 1400000 __mmu_mapin_ram:155 1400000 __mmu_mapin_ram:146 1400000 30000000 __mmu_mapin_ram:155 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023 kasan_init_region: c0000000 30000000 f8000000 fe000000 kasan_init_region: loop f8000000 fe000000 So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".Ah ok, maybe your CPU only has 4 BATs and they are all used, following change would tell us.diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c index 850783cfa9c7..bd26767edce7 100644 --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c@@ -86,6 +86,7 @@ int __init find_free_bat(void) if (!(bat[1].batu & 3)) return b; } + pr_err("NO FREE BAT (%d)\n", n); return -1; }Or you have 8 BATs in which case it's an alignment problem, you need to increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and CONFIG_DATA_SHIFT_BOOL But regardless of that there is a problem we need to find out, because it should work without BATs. As the BATs allocation fails, it falls back to : phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0, MEMBLOCK_ALLOC_ANYWHERE); if (!phys) return -ENOMEM; } ret = kasan_init_shadow_page_tables(k_start, k_end); if (ret) return ret; for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL); __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0); } flush_tlb_kernel_range(k_start, k_end); memset(kasan_mem_to_shadow(start), 0, k_end - k_start); While the __weak function that you confirmed working is: ret = kasan_init_shadow_page_tables(k_start, k_end); if (ret) return ret; block = memblock_alloc(k_end - k_start, PAGE_SIZE); if (!block) return -ENOMEM; for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); void *va = block + k_cur - k_start; pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL); __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0); } flush_tlb_kernel_range(k_start, k_end); I'm having hard time to understand what's could be wrong at the first place. Could you try following change:diff --git a/arch/powerpc/mm/kasan/book3s_32.cb/arch/powerpc/mm/kasan/book3s_32.c index 9954b7a3b7ae..e04f21908c6a 100644--- a/arch/powerpc/mm/kasan/book3s_32.c +++ b/arch/powerpc/mm/kasan/book3s_32.c@@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size) if (k_nobat < k_end) { phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0, - MEMBLOCK_ALLOC_ANYWHERE); + MEMBLOCK_ALLOC_ACCESSIBLE); if (!phys) return -ENOMEM; }And also that one:diff --git a/arch/powerpc/mm/kasan/init_32.cb/arch/powerpc/mm/kasan/init_32.c index a70828a6d935..bc1c075489f4 100644--- a/arch/powerpc/mm/kasan/init_32.c +++ b/arch/powerpc/mm/kasan/init_32.c@@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,unsigned long k_end, pte_t pte) { unsigned long k_cur; + if (k_start == k_end) + return; + for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); pte_t *ptep = pte_offset_kernel(pmd, k_cur);I tested the two vmlinux you sent me offlist, they both start without problem on QEMU.
For me no problems show up on QEMU either. But QEMU does not seem able to mimic my G4 DPs configuration. That would be a dual CPU G4 + SMP config.
So lets forget that for the moment, allthought you may try with CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs.
CONFIG_STRICT_KERNEL_RWX=y was enabled all along on my kernel .config. But for comparison I disabled it. If I disable STRICT_KERNEL_RWX I get no output about BATs whatsoever. Details below.
In your last mail you say you tried with all patches. Did it include the two above changes ? If not can you perform the tests with those two changes in addition, first one by one then both together depending on the result ?
I think I did apply both but I re-did the checks just to be sure. For my 'all patches applied' config please check the attached git diff. dmesg with patch 1 "MEMBLOCK_ALLOC_ACCESSIBLE);" applied: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:170 0 30000000 1400000 2000000 __mmu_mapin_ram:147 0 1400000 __mmu_mapin_ram:156 1400000 __mmu_mapin_ram:147 1400000 30000000 NO FREE BAT (8) __mmu_mapin_ram:156 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023 kasan_init_region: c0000000 30000000 f8000000 fe000000 NO FREE BAT (8) kasan_init_region: loop f8000000 fe000000 dmesg with patch 2 "if (k_start == k_end) return;" applied: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:170 0 30000000 1400000 2000000 __mmu_mapin_ram:147 0 1400000 __mmu_mapin_ram:156 1400000 __mmu_mapin_ram:147 1400000 30000000 NO FREE BAT (8) __mmu_mapin_ram:156 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:132 kasan_mmu_init:135 0 kasan_mmu_init:140 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023 kasan_init_region: c0000000 30000000 f8000000 fe000000 NO FREE BAT (8) kasan_init_region: loop f8000000 fe000000 dmesg with both KASAN patches applied: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:170 0 30000000 1400000 2000000 __mmu_mapin_ram:147 0 1400000 __mmu_mapin_ram:156 1400000 __mmu_mapin_ram:147 1400000 30000000 NO FREE BAT (8) __mmu_mapin_ram:156 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:132 kasan_mmu_init:135 0 kasan_mmu_init:140 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023 kasan_init_region: c0000000 30000000 f8000000 fe000000 NO FREE BAT (8) kasan_init_region: loop f8000000 fe000000 dmesg with both KASAN patches and STRICT_KERNEL_RWX=n applied: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:170 0 30000000 1400000 2000000 __mmu_mapin_ram:147 0 1400000 __mmu_mapin_ram:156 1400000 __mmu_mapin_ram:147 1400000 30000000 __mmu_mapin_ram:156 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:132 kasan_mmu_init:135 0 kasan_mmu_init:140
Many thanks for your help and perseverance Christophe
You're welcome! Same to you! :) Regards, Erhard
Attachments
- all_patches.patch [text/x-patch] 7187 bytes · preview