Thread (24 messages) 24 messages, 8 authors, 2011-01-18

Re: hunting an IO hang

From: Mel Gorman <hidden>
Date: 2011-01-17 23:04:24

On Mon, Jan 17, 2011 at 04:23:56PM -0500, Chris Mason wrote:
Excerpts from Linus Torvalds's message of 2011-01-17 13:24:55 -0500:
quoted
On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason [off-list ref] wrote:
quoted
quoted
quoted
I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
than any runs in the past.
Confirmed that reverting these patches makes the problem unreproducible
for the many_dd's + fsmark for at least an hour here.
After 2+ hours I'm still running with those two commits gone.  I'm
confident they are the cause of the crashes.  I also haven't triggered
the cfq stalls without them.
Ok, so the question is how to proceed from here.

I can easily revert them, and since I was planning on doing -rc1
tonight, I probably will. But I promised Chris to delay until tomorrow
if he needed time to chase this down, and while it's now apparently
chased down, I'll certainly also be open to delaying until tomorrow if
somebody has a patch to fix it.

So right now my plan is:
 - I will revert those two later today and then release -rc1 in the evening
UNLESS
 - somebody posts a patch for the problem in the next few hours and
Chris/others are willing to give it a good test overnight (or whatever
people feel is "sufficient" based on how easily they can trigger the
issue), in which case I'd do -rc1 tomorrow (either with the reverts or
the patch, depending on how testing works out)
If a patch does come in, I'm happy to test it.  Mel had a test that
triggered within 1-2 minutes, mine took 30 or so, which means I'd want a
2 hour run to convince myself it was really fixed.  But, I'll give Mel's
fs_mark + dd workload a try on the buggy kernel.
I spent a while seeing if there was a simple patch but it's not trivially
fixable. __activate_page() is getting called in too many different situations
to be fully sure the function is doing the right thing in all cases. I also
couldn't convince myself that the accounting was correct in all cases. I
think the idea of batching updates from mark_page_accessed() in particular
is a good idea but the patch needs a do-over.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help