Thread (4 messages) 4 messages, 3 authors, 2018-06-27

Re: [powerpc/powervm]kernel BUG at mm/memory_hotplug.c:1864!

From: vrbagal1 <hidden>
Date: 2018-06-27 13:37:14
Also in: linux-mm, linuxppc-dev

On 2018-06-26 20:24, Nathan Fontenot wrote:
On 06/12/2018 05:28 AM, Balbir Singh wrote:
quoted

On 11/06/18 17:41, vrbagal1 wrote:
quoted
On 2018-06-08 17:45, Oscar Salvador wrote:
quoted
On Fri, Jun 08, 2018 at 05:11:24PM +0530, vrbagal1 wrote:
quoted
On 2018-06-08 16:58, Oscar Salvador wrote:
quoted
On Fri, Jun 08, 2018 at 04:44:24PM +0530, vrbagal1 wrote:
quoted
Greetings!!!

I am seeing kernel bug followed by oops message and system 
reboots,
while
running dlpar memory hotplug test.

Machine Details: Power6 PowerVM Platform
GCC version: (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC))
Test case: dlpar memory hotplug test 
(https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py)
Kernel Version: Linux version 4.17.0-autotest

I am seeing this bug on rc7 as well.
Observing similar traces on linux next kernel: 
4.17.0-next-20180608-autotest

 Block size [0x4000000] unaligned hotplug range: start 0x220000000, 
size 0x1000000
size < block_size in this case, why? how? Could you confirm that the 
block size is 64MB and your trying to remove 16MB
I was not able to re-create this failure exactly ( I don't have a 
Power6 system)
but was able to get a similar re-create on a Power 9 with a few 
modifications.

I think the issue you're seeing is due to a change in the validation of 
memory
done in remove_memory to ensure the amount of memory being removed 
spans
entire memory block. The pseries memory remove code, see
pseries_remove_memblock,
tries to remove each section of a memory block instead of the entire
memory block.

Could you try the patch below that updates the pseries code to remove 
the entire
memory block instead of doing it one section at a time.

-Nathan

Hi Nathan,

With below patch applied on 4.18.0-rc2 I am seeing below oops message.

------------[ cut here ]------------
kernel BUG at mm/memory_hotplug.c:150!
Oops: Exception in kernel mode, sig: 5 [#1]
BE SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 
nf_defrag_ipv6 ipt_REJECT cfg80211 nf_reject_ipv4 nf_conntrack_ipv4 
nf_defrag_ipv4 rfkill xt_conntrack nf_conntrack libcrc32c ebtable_nat 
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle 
iptable_security iptable_raw iptable_filter ip_tables ses osst enclosure 
scsi_transport_sas ehea st uio_pdrv_genirq uio nfsd auth_rpcgss nfs_acl 
lockd grace sunrpc ipv6 crc_ccitt ext4 mbcache jbd2 sd_mod sr_mod cdrom 
dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 5 PID: 2925 Comm: drmgr Tainted: G        W         
4.18.0-rc2-00045-g671afc8 #2
NIP:  c0000000002cf278 LR: c0000000002c0c38 CTR: 0000000000000400
REGS: c0000002ac4ab150 TRAP: 0700   Tainted: G        W          
(4.18.0-rc2-00045-g671afc8)
MSR:  8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28002884  XER: 00000000
CFAR: c0000000002c0c00 IRQMASK: 0
GPR00: c0000000002c0c38 c0000002ac4ab3d0 c000000001159b00 
c0000002b1091810
GPR04: 0000000000000000 0000000000000000 0000000000000000 
0000000000002b10
GPR08: c0000002b3fd0600 0000000000000001 0000000000000000 
0000000000000220
GPR12: 0000000088002884 c00000000eeaa000 000000000002b400 
0000000000024d00
GPR16: c0000002b3f8ca00 0000000000024c00 c0000000d3fc89c0 
0000000000024d00
GPR20: 0000000000000003 0000000000000004 c0000002b3f7ca8c 
0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
GPR28: c0000002b3fd0600 c0000002b1f7c6c0 c0000002b3f86224 
c0000002b1091810
NIP [c0000000002cf278] .put_page_bootmem+0x28/0xf0
LR [c0000000002c0c38] .sparse_remove_one_section+0x228/0x2c0
Call Trace:
[c0000002ac4ab3d0] [c0000002ac4ab450] 0xc0000002ac4ab450 (unreliable)
[c0000002ac4ab450] [c0000000002c0c38] 
.sparse_remove_one_section+0x228/0x2c0
[c0000002ac4ab4f0] [c0000000002cf6f8] .__remove_pages+0x3b8/0x550
[c0000002ac4ab600] [c0000000008d32a4] .arch_remove_memory+0xb4/0x128
[c0000002ac4ab680] [c0000000002d1cd0] .remove_memory+0xb0/0x100
[c0000002ac4ab710] [c0000000000bc7b4] .pseries_remove_memblock+0x94/0xe0
[c0000002ac4ab790] [c0000000000bd3f8] 
.pseries_memory_notifier+0x248/0x260
[c0000002ac4ab820] [c000000000116ee8] .notifier_call_chain+0x78/0xf0
[c0000002ac4ab8c0] [c000000000117358] 
.__blocking_notifier_call_chain+0x58/0x90
[c0000002ac4ab960] [c000000000743e30] .of_property_notify+0x90/0xd0
[c0000002ac4aba10] [c00000000073ed04] .of_update_property+0x104/0x150
[c0000002ac4abac0] [c0000000000b045c] .ofdt_write+0x3bc/0x6f0
[c0000002ac4abb90] [c0000000003735b8] .proc_reg_write+0x78/0xc0
[c0000002ac4abc10] [c0000000002deaac] .__vfs_write+0x3c/0x200
[c0000002ac4abcf0] [c0000000002deeb0] .vfs_write+0xc0/0x230
[c0000002ac4abd90] [c0000000002df214] .ksys_write+0x54/0x100
[c0000002ac4abe30] [c00000000000b9dc] system_call+0x5c/0x70
Instruction dump:
60000000 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ff81 e9230020
3929fff4 21290002 7d294910 7d2900d0 <0b090000> 7c0004ac 39230034 
7d404828
---[ end trace 85b846899f1bdbb7 ]---


Regards,
Venkat.

quoted hunk ↗ jump to hunk
---

 arch/powerpc/platforms/pseries/hotplug-memory.c |   18 
++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..6072efc793e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -316,11 +316,11 @@ static int dlpar_offline_lmb(struct drmem_lmb 
*lmb)
 	return dlpar_change_lmb_state(lmb, false);
 }

-static int pseries_remove_memblock(unsigned long base, unsigned int
memblock_size)
+static int pseries_remove_memblock(unsigned long base,
+				   unsigned int memblock_sz)
 {
-	unsigned long block_sz, start_pfn;
-	int sections_per_block;
-	int i, nid;
+	unsigned long start_pfn;
+	int nid;

 	start_pfn = base >> PAGE_SHIFT;
@@ -329,18 +329,12 @@ static int pseries_remove_memblock(unsigned long
base, unsigned int memblock_siz
 	if (!pfn_valid(start_pfn))
 		goto out;

-	block_sz = pseries_memory_block_size();
-	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
 	nid = memory_add_physaddr_to_nid(base);
-
-	for (i = 0; i < sections_per_block; i++) {
-		remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
-		base += MIN_MEMORY_BLOCK_SIZE;
-	}
+	remove_memory(nid, base, memblock_sz);

 out:
 	/* Update memory regions for memory remove */
-	memblock_remove(base, memblock_size);
+	memblock_remove(base, memblock_sz);
 	unlock_device_hotplug();
 	return 0;
 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help