On Wed, 2017-09-20 at 12:54 -0700, Kees Cook wrote:
On Wed, Sep 20, 2017 at 12:40 AM, Abdul Haleem
[off-list ref] wrote:
quoted
On Tue, 2017-09-12 at 12:11 +0530, abdul wrote:
quoted
Hi,
Memory hot-unplug on PowerVM LPAR running next-20170911 results in
Faulting instruction address: 0xc0000000002b56c4
which maps to the below code path:
0xc0000000002b56c4 is in __rmqueue (./include/linux/list.h:104).
99 * This is only for internal list manipulation where we know
100 * the prev/next entries already!
101 */
102 static inline void __list_del(struct list_head * prev, struct
list_head * next)
103 {
104 next->prev = prev;
105 WRITE_ONCE(prev->next, next);
106 }
107
108 /**
I see another kernel Oops when running transparent hugepages
de-fragmentation test.
And the faulty instruction address again pointing to same code line
0xc00000000026f9f4 is in compaction_alloc (./include/linux/list.h:104)
steps to recreate:
-----------------
1. Enable transparent hugepages ("always")
2. Turn off the defrag $ echo 0 > khugepaged/defrag
3. Write random to memory path
4. Set huge pages numbers
5. Turn on defrag $ echo 1 > khugepaged/defrag
new trace:
----------
Unable to handle kernel paging request for data at address
0x5deadbeef0000108
This looks like use-after-list-removal, that value appears to be LIST_POISON1.
Try enabling CONFIG_DEBUG_LIST to see if you get better details?
With above config enabled I see below messages and also call traces. But
no kernel Oops.
BUG: Bad page state in process drmgr pfn:770c7
page:f000000001dc31c0 count:0 mapcount:0 mapping:f000000001dc31c8
index:0x1
flags: 0x33ffff800000000()
raw: 033ffff800000000 f000000001dc31c8 0000000000000001 00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000
page dumped because: non-NULL mapping
--
Regard's
Abdul Haleem
IBM Linux Technology Centre