Thread (25 messages) 25 messages, 11 authors, 2006-11-29

Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

From: David Chinner <hidden>
Date: 2006-11-21 23:32:00
Also in: linux-scsi, lkml

On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
On 21/11/06, David Chatterton [off-list ref] wrote:
quoted
Jesper,

In the short term, the best workaround is to use 8K stacks.
Yeah, that's what I'm currently doing and the box seems more stable
(at least it has not crashed yet, but with 4K stacks it usually would
have by now).
quoted
We do not see stack
overflow problems with NFS + XFS + volume managers + disk devices.
Could the size of my devices be part of the cause? some of the logical
volumes I have mounted are multiple TB in size?
No.
quoted
Audits have been done in the past and will again be done in the future to 
try to
identify areas where XFS could use less stack space by reducing/avoid large
local variables. Reducing the code path is far more difficult.
I realize that fixing the problem may be difficult. I just wanted to
make sure that people were informed that there is an actual problem
and provide as much info as possible so that perhaps in the future it
can be fixed... :)
I've got one that prevents gcc from inlining single use functions in XFS
that I need to finish off, and that results in some significant stack
usage reductions in some XFS functions.

However, XFS is only one part of the picture - when you put NFS on top,
DM+md then scsi/FC below and then you nest a soft irq that might go
20 functions deep as well - then 4k stacks simply aren't big enough.
I'm reading through the XFS code myself at the moment and I'll be sure
to submit patches if I spot something that could help reduce stack
usage.
Most of the low hanging fruit is already gone. The problem we are
facing now for further reductions in stack usage is the fact that we
need to factor code. That is a major undertaking and has a _lot_ of
risk associated with it....
quoted
There is active discussion about reducing inlining:
http://bugzilla.kernel.org/show_bug.cgi?id=7364
Thanks, I'll check that out.
That's one of the few remaining low hanging fruit, and that's fixed
in the patches I already have.
quoted
Thanks for traces, I've captured this information.
You are welcome. If you want/need more traces then I've got ~2.1G
worth of traces that you can have :)
Well, we don't need that many, but it would be nice to have a
set of unique traces that lead to overflows - could you process
them in some way just to extract just the unique XFS traces that
occur?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help