Re: [PATCH v7 16/23] powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits)
From: Christophe Leroy <hidden>
Date: 2024-08-07 10:11:55
Also in:
lkml
Hi, Le 31/07/2024 à 18:35, Guenter Roeck a écrit :
On 7/31/24 08:36, LEROY Christophe wrote:quoted
Hi Guenter, Thanks for this report. I'm afk this week, i"ll have a look at it in more détails next week. But to be sûre, does that Oops match the bisected commit ? Because pmd_leaf() for e500 doesn't exist yet so pmd_write() shouldnt be called. I did validate all my changes with mpc8544 on qemu when i implemented this séries, using map_hugetlb mm selftest. What test tool are you using ?Nothing special; it is just a qemu boot test with various module test and debug options enabled, using a root file system generated with buildroot.
I still don't get anything with mpc85xx_defconfig. Can you tell with debug options you use and which module tests ? Thanks Christophe
quoted hunk ↗ jump to hunk
As mentioned, I can not just revert the offending commit, and the crash signature changes while running bisect. If I run a test on v6.10-rc6-396-g6b0e82791bd0, I get the following. ... Btrfs loaded, zoned=no, fsverity=no ------------[ cut here ]------------ WARNING: CPU: 0 PID: 61 at mm/gup.c:685 follow_hugepd.constprop.0+0x138/0x170 Modules linked in: CPU: 0 PID: 61 Comm: kworker/u4:1 Not tainted 6.10.0-rc6-00396-g6b0e82791bd0 #1 Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS NIP: c01f5af8 LR: c01f60f4 CTR: 00000000 REGS: c7147be0 TRAP: 0700 Not tainted (6.10.0-rc6-00396-g6b0e82791bd0) MSR: 00029000 <CE,EE,ME> CR: 28228202 XER: 20000000 GPR00: c01f66b8 c7147cd0 c5c08020 c5f2fb88 c7147d1c bfffffed 00050003 c7147d74 GPR08: 00000001 00000000 00000000 ffffffff 28228202 00000000 c0071dcc c5c98968 GPR16: 00000000 00000000 00000001 00050003 28228202 28228202 00000000 00000095 GPR24: c135ce70 c7147dd8 c7147e38 5a5a5a5a 00050003 bfffffed c7147d74 c5f2fb88 NIP [c01f5af8] follow_hugepd.constprop.0+0x138/0x170 LR [c01f60f4] follow_page_mask+0xac/0x518 Call Trace: [c7147d10] [c02097a4] find_vma+0x44/0x8c [c7147d60] [c01f66b8] __get_user_pages+0x158/0x5d8 [c7147dc0] [c01f6c98] get_user_pages_remote+0x160/0x560 [c7147e20] [c0266e94] get_arg_page+0xb0/0x25c [c7147e60] [c02679b8] copy_string_kernel+0xf0/0x200 [c7147ea0] [c0268dbc] kernel_execve+0xfc/0x1dc [c7147ed0] [c0071ed8] call_usermodehelper_exec_async+0x10c/0x198 [c7147f00] [c0016224] ret_from_kernel_user_thread+0x10/0x18--- interrupt: 0 at 0x0Code: 0fe00000 4bffff6c 81210018 2c090001 4082002c 81590050 39200001 80610014 7d295030 3929ffff 913a0004 4bffff94 <0fe00000> 3860fff2 4bffffa8 0fe00000 irq event stamp: 78 hardirqs last enabled at (77): [<c10391e4>] _raw_spin_unlock_irqrestore+0x70/0xa8 hardirqs last disabled at (78): [<c000ed0c>] program_check_exception+0x78/0x12c softirqs last enabled at (0): [<c005047c>] copy_process+0x7dc/0x1e70 softirqs last disabled at (0): [<00000000>] 0x0 ---[ end trace 0000000000000000 ]--- mm/pgtable-generic.c:54: bad pgd 5a5a5a5a. ============================================================================= BUG pgtable-2^11 (Tainted: G W ): Object padding overwritten ----------------------------------------------------------------------------- 0xc5fdcff8-0xc5fdcfff @offset=20472. First byte 0x0 instead of 0x5a Allocated in mm_init.constprop.0+0x260/0x2b4 age=2 cpu=0 pid=61 mm_init.constprop.0+0x260/0x2b4 alloc_bprm+0xd0/0x38c kernel_execve+0x58/0x1dc call_usermodehelper_exec_async+0x10c/0x198 ret_from_kernel_user_thread+0x10/0x18 Slab 0xcfeb1b00 objects=1 used=1 fp=0x00000000 flags=0x40(head|zone=0) Object 0xc5fda000 @offset=8192 fp=0x00000000 Redzone c5fd8000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ ... Padding c5fddff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ CPU: 0 PID: 61 Comm: kworker/u4:1 Tainted: G W 6.10.0-rc6-00396-g6b0e82791bd0 #1 Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS Call Trace: [c7147d60] [c0ff7a5c] dump_stack_lvl+0x4c/0xc8 (unreliable) [c7147d80] [c02336b4] check_bytes_and_report+0x17c/0x200 [c7147dc0] [c02318b8] check_object+0x108/0x418 [c7147df0] [c0231ddc] free_to_partial_list+0x214/0x764 [c7147e50] [c004e258] __mmdrop+0x6c/0x140 [c7147e80] [c0267070] free_bprm+0x30/0xbc [c7147ea0] [c0268e10] kernel_execve+0x150/0x1dc [c7147ed0] [c0071ed8] call_usermodehelper_exec_async+0x10c/0x198 [c7147f00] [c0016224] ret_from_kernel_user_thread+0x10/0x18--- interrupt: 0 at 0x0Disabling lock debugging due to kernel taint FIX pgtable-2^11: Restoring Object padding 0xc5fdcff8-0xc5fdcfff=0x5a mm/pgtable-generic.c:54: bad pgd 5a5a5a5a. ============================================================================= BUG pgtable-2^11 (Tainted: G B W ): Object padding overwritten ----------------------------------------------------------------------------- 0xc5fdcff8-0xc5fdcfff @offset=20472. First byte 0x0 instead of 0x5a Allocated in mm_init.constprop.0+0x260/0x2b4 age=0 cpu=0 pid=62 mm_init.constprop.0+0x260/0x2b4 alloc_bprm+0xd0/0x38c kernel_execve+0x58/0x1dc call_usermodehelper_exec_async+0x10c/0x198 ret_from_kernel_user_thread+0x10/0x18 Freed in __mmdrop+0x6c/0x140 age=46 cpu=0 pid=61 free_bprm+0x30/0xbc kernel_execve+0x150/0x1dc call_usermodehelper_exec_async+0x10c/0x198 ret_from_kernel_user_thread+0x10/0x18 Slab 0xcfeb1b00 objects=1 used=1 fp=0x00000000 flags=0x40(head|zone=0) Object 0xc5fda000 @offset=8192 fp=0x00000000 Redzone c5fd8000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ and so on. The same boot test passes with v6.10-rc6-395-ge081c14744f4. Context: Build reference: v6.10-rc6-396-g6b0e82791bd0 Compiler version: powerpc64-linux-gcc (GCC) 11.5.0 Qemu version: 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.21) I also tried with qemu 9.0.2 and gcc 13.3, with the same result. Build reference: v6.10-rc6-396-g6b0e82791bd0 Compiler version: powerpc64-linux-gcc (GCC) 13.3.0 Qemu version: 9.0.2 (v9.0.2-34-gc332443796-dirty) Guenterquoted
Thanks Christophe Envoyé à partir de Outlook pour Android <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C02%7Cchristophe.leroy2%40cs-soprasteria.com%7C68dba4856856442ee22f08dcb17ed714%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638580405546847279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=dyHXf4ygEAOoHLHpvhm9BvvqDpqcX%2FEGer%2Bz202qTXo%3D&reserved=0> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ *From:* Linuxppc-dev [off-list ref] on behalf of Guenter Roeck [off-list ref] *Sent:* Tuesday, July 30, 2024 12:10:51 AM *To:* Christophe Leroy [off-list ref] *Cc:* linux-kernel@vger.kernel.org [off-list ref]; Nicholas Piggin [off-list ref]; linux-mm@kvack.org [off-list ref]; Peter Xu [off-list ref]; Jason Gunthorpe [off-list ref]; Andrew Morton [off-list ref]; linuxppc-dev@lists.ozlabs.org [off-list ref]; Oscar Salvador [off-list ref] *Subject:* Re: [PATCH v7 16/23] powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits) Hi, On Tue, Jul 02, 2024 at 03:51:28PM +0200, Christophe Leroy wrote:quoted
At the time being when CONFIG_PTE_64BIT is selected, PTE entries are 64 bits but PGD entries are still 32 bits. In order to allow leaf PMD entries, switch the PGD to 64 bits entries. Signed-off-by: Christophe Leroy <redacted>With this patch in the mainline kernel, all my boot tests based on the mpc8544ds qemu emulation start crashing. Example crash log: kernel BUG at include/linux/pgtable.h:1599! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MPC8544 DS Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper Tainted: G N 6.11.0-rc1 #1 Tainted: [N]=TEST Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS NIP: c01f51b8 LR: c01f6fec CTR: 00000000 REGS: c4135c40 TRAP: 0700 Tainted: G N (6.11.0-rc1) MSR: 00029000 <CE,EE,ME> CR: 24228288 XER: 20000000 GPR00: c01f6fc0 c4135d30 c415bf20 c762e3f0 c29c9318 c7624ff8 0000026b b5fc2ea1 GPR08: 00000000 00000000 5a5a5000 b7f4dd55 44228282 00000000 c0005014 00000000 GPR16: 00000000 00000000 00000001 00050003 24228282 24228282 00000000 00000095 GPR24: c1375b30 c4135de8 c4135e48 00050003 c762e3a0 c762e3f0 bffffff1 c7676a08 NIP [c01f51b8] pmd_write.constprop.0.isra.0+0x4/0x8 LR [c01f6fec] follow_page_mask+0x150/0x17c Call Trace: [c4135d30] [c4135de8] 0xc4135de8 (unreliable) [c4135d40] [c01f6fc0] follow_page_mask+0x124/0x17c [c4135d70] [c01f7170] __get_user_pages+0x158/0x5d8 [c4135dd0] [c01f7750] get_user_pages_remote+0x160/0x560 [c4135e30] [c026838c] get_arg_page+0xb0/0x25c [c4135e70] [c0268dd4] copy_string_kernel+0xf0/0x200 [c4135eb0] [c026a0e4] kernel_execve+0xbc/0x190 [c4135ee0] [c0005108] kernel_init+0xf4/0x1d4 [c4135f00] [c0016224] ret_from_kernel_user_thread+0x10/0x18 This is with v6.11-rc1; the actually observed crash differs from test to test while running bisect. I can't just revert the patch because subsequent patches depend on it. Is this confirmed to work on real hardware ? If so, do you have a suggestion how I could continue to use the mpc8544ds emulation for testing, or is it just dead ? For reference, the configuration file is mpc85xx_defconfig. Bisect log is attached. Thanks, Guenter --- # bad: [8400291e289ee6b2bf9779ff1c83a291501f017b] Linux 6.11-rc1 # good: [2c9b3512402ed192d1f43f4531fb5da947e72bd0] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect start 'v6.11-rc1' '2c9b3512402e' # bad: [6dc2e98d5f1de162d1777aee97e59d75d70d07c5] s390: Remove protvirt and kvm config guards for uv code git bisect bad 6dc2e98d5f1de162d1777aee97e59d75d70d07c5 # bad: [30d77b7eef019fa4422980806e8b7cdc8674493e] mm/mglru: fix ineffective protection calculation git bisect bad 30d77b7eef019fa4422980806e8b7cdc8674493e # good: [c02525a33969000fa7b595b743deb4d79804916b] ftrace: unpoison ftrace_regs in ftrace_ops_list_func() git bisect good c02525a33969000fa7b595b743deb4d79804916b # good: [8ef6fd0e9ea83a792ba53882ddc6e0d38ce0d636] Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix crashes from deferred split racing folio migration", needed by "mm: migrate: split folio_migrate_mapping()". git bisect good 8ef6fd0e9ea83a792ba53882ddc6e0d38ce0d636 # good: [a898530eea3d0ba08c17a60865995a3bb468d1bc] powerpc/64e: split out nohash Book3E 64-bit code git bisect good a898530eea3d0ba08c17a60865995a3bb468d1bc # bad: [00f58104202c472e487f0866fbd38832523fd4f9] mm: fix khugepaged activation policy git bisect bad 00f58104202c472e487f0866fbd38832523fd4f9 # good: [e081c14744f4a93514069e1af1a7273d5451b909] powerpc/e500: remove enc and ind fields from struct mmu_psize_def git bisect good e081c14744f4a93514069e1af1a7273d5451b909 # bad: [57fb15c32f4f6a4f1a58f1fbc58a799c3f975ed8] powerpc/64s: use contiguous PMD/PUD instead of HUGEPD git bisect bad 57fb15c32f4f6a4f1a58f1fbc58a799c3f975ed8 # bad: [276d5affbbaea4d369d1e5b9711cb2951037f6ee] powerpc/e500: don't pre-check write access on data TLB error git bisect bad 276d5affbbaea4d369d1e5b9711cb2951037f6ee # bad: [84319905ca5f3759c42082e20ed978c81f4dead0] powerpc/e500: encode hugepage size in PTE bits git bisect bad 84319905ca5f3759c42082e20ed978c81f4dead0 # bad: [6b0e82791bd03b2326c7f7d8c1124c825742f2a4] powerpc/e500: switch to 64 bits PGD on 85xx (32 bits) git bisect bad 6b0e82791bd03b2326c7f7d8c1124c825742f2a4 # first bad commit: [6b0e82791bd03b2326c7f7d8c1124c825742f2a4] powerpc/e500: switch to 64 bits PGD on 85xx (32 bits) C2 – Usage restreint