Re: kerneloops.org report for the week
From: Ingo Molnar <hidden>
Date: 2009-06-29 03:18:47
Also in:
lkml
* Arjan van de Ven [off-list ref] wrote:
Few "highlights" this week * mem_cgroup_add_lru_list (rank 2) is a high rising issue; it's list corruption, question is why this is new * rank 13 (memcmp in the raid code) is also new * the warning in get_free_pages that has been discussed on lkml is dropping from the ranks again This week, a total of 15273 oopses and warnings have been reported, compared to 13384 reports in the previous week. Rank 2: mem_cgroup_add_lru_list (warn) Reported 1554 times (1622 total reports) List corruption in the VM code This oops was last seen in version 2.6.30-git19, and first seen in 2.6.29. More info: http://www.kerneloops.org/searchweek.php?search=mem_cgroup_add_lru_list
At least one list corruption bug was fixed by: cb4cbcf: mm: fix incorrect page removal from LRU
Rank 3: getnstimeofday (warning) Reported 1319 times (4893 total reports) [suspend resume] getnstimeofday() is called before timekeeping is resumed This oops was last seen in version 2.6.30, and first seen in 2.6.24. More info: http://www.kerneloops.org/searchweek.php?search=getnstimeofday
Probably caused by some buggy driver callback?
Rank 7: hres_timers_resume (warning) Reported 763 times (2368 total reports) [suspend resume] hres_timers_resume() is incorrectly called with interrupts on This warning was last seen in version 2.6.30, and first seen in 2.6.24.7. More info: http://www.kerneloops.org/searchweek.php?search=hres_timers_resume
This is probably a driver incorrectly enabling irqs in a resume callback. This should be easier and more specific to debug with the lockdep based annotation i suggested for the suspend code in various `mails.
Rank 8: generic_get_mtrr (warning) Reported 544 times (2061 total reports) BIOS bug where the MTRRs are not set up correctly This warning was last seen in version 2.6.30, and first seen in 2.6.25.3. More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr
I think this calls for enabling the x86 MTRR sanitizer by default -
500 out of 15000 reports suggests a significant proportion of Linux
systems is affected by MTRR setup problems.
I.e. we should change:
config MTRR_SANITIZER_ENABLE_DEFAULT
int "MTRR cleanup enable value (0-1)"
range 0 1
default "0"
To 'default "1"'. Any objections?
If the MTRR sanitizer is enabled then i think the above warning in
generic_get_mtrr() should never trigger.
Ingo