Thread (47 messages) 47 messages, 9 authors, 2008-01-25

Re: 2.6.24-rc6-mm1

From: Jarek Poplawski <hidden>
Date: 2008-01-05 00:05:19
Also in: lkml

On Fri, Jan 04, 2008 at 04:21:26PM +0100, Torsten Kaiser wrote:
On Jan 4, 2008 2:30 PM, Jarek Poplawski [off-list ref] wrote:
...
I'm open for any suggestions and will try to answer any questions.
I'm very glad, thanks!
The only thing that is sadly not practical is bisecting the borkenout
mm-patches, as triggering this error is to unreliable /
time-consuming.
Right, but it seems there are these 2 main suspects here...
quoted
- is it still vanilla -rc6-mm1; I've seen on kernel list you tried
some fixes around raid?
Yes, without these fixes I can't boot.
But they should only be run during starting the arrays, so I doubt
that this is that cause.
(Also -rc3-mm2 did not need this fix)
You've written vanilla -rc6 is OK. Does it mean -rc6 with these fixes?
I think it would be easier just to start with this working -rc6 and
simply check if we have 'right' suspects, so: git-net.patch and
git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope,
can compile - otherwise you could try the other way: add the whole -mm
and revert these two). Using current gits could complicate this
"investigation". 
My skbuff-double-free-detector is still in there, but was never triggered.
quoted
- could you remind this lockdep warning; is it always and the same,
always before crash, or no rules?
???
I see no lockdep warning before the crashes.
I have seen a warning about the dst->__refcnt in dst_release and
different warnings about list operations.

I think I have always posted everything I have seen before the
crashes. (captured via serial console)
So, you mean there are no more of these?:
 
"looked into the log in question and the only other warning was a
 circular locking dependency that lockdep detected around 1.5 hour
 before this warning."
...
"[ 7620.845168] INFO: lockdep is turned off."
(If you mean the lockdep-problem in -rc6: That is more or less a
missing annotation during early bootup. The only problem with that is,
that it will causes lockdep to be turned off and so it can not be used
to find any real problem. A fix for that is in -mm so I do have
lockdep on the mm-kernels)
quoted
- I've seen you looked after double freeing, but this last debug list
warning could suggest locking problems during list modification too.
Yes, but Herbert mentioned double freeing a skb explicit and so I
tried to catch this.
I do not know enough about the network core to verify the locking of
the involved lists.
Right, the list corruption could be because of use after freeing too.
quoted
- above git-nfsd and git-net tests should be probably repeated with
-rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
only, and if bug triggers, with one reversed; btw., since in previous
message you mentioned that 50 packages could be not enough to trigger
this, these 54 above could make too little margin yet.
Yes, I think I really need to redo the git-nfsd-test.
With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
run of kde-packages triggered it after only 5 packages.
I don't know what this bug hates about kdeartwork-wallpaper (triggered
it this time) or kdeartwork-styles.
I didn't read all this thread, so probably I miss many points, but are
you sure there are no problems with filesystem corruption around these
packets or where you compile(?) them (e.g. after these raid problems)?
Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did
not trigger):
[15593.236374] Unable to handle kernel NULL pointer
dereference<3>list_add corruption. prev->next should be next
Fine! I'll try to look at this. BTW, I guess/hope DEBUG_SLAB etc. are
also on...

Thanks,
Jarek P.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help