Thread (3 messages) 3 messages, 2 authors, 2003-07-18

RE: kernel BUG at sched.c:784!

From: Jack Miller <hidden>
Date: 2003-07-18 18:48:55

Possibly related (same subject, not in this thread)

  Jun,

    Thanks alot.  ' Will implement the prescribed debugging code and see
what it yields...

  Jack
-----Original Message-----
From: Jun Sun [mailto:jsun@mvista.com]
Sent: Friday, July 18, 2003 11:30 AM
To: Jack Miller
Cc: Linux-Mips; jsun@mvista.com
Subject: Re: kernel BUG at sched.c:784!


On Fri, Jul 18, 2003 at 10:48:22AM -0700, Jack Miller wrote:
quoted
  Jun,
   Here is the excerpt from sched.c:

	prepare_to_switch();
	{
		struct mm_struct *mm = next->mm;
		struct mm_struct *oldmm = prev->active_mm;
		if (!mm) {
784			if (next->active_mm) BUG();
			next->active_mm = oldmm;
			atomic_inc(&oldmm->mm_count);
			enter_lazy_tlb(oldmm, next, this_cpu);

  Jack
Jack,

Plese change that line to
			if (next->active_mm) {
				printk("active_mm = %p\n", next->active_mm);
				BUG();
			}

If you see next->active_mm to be NULL, you are seeing the CPU bug.

However, given the frequency you are seeing the problem, I suspect it
is something else.

Whenever CPU runs out active processes to run, it will switch to idle
process, which does not have a mm.  The BUG() basically says "last time
when idle process was switched off, its active_mm pointer should be set
NULL".  The dropping active_mm code is shortly after the above chunk.

Jun
quoted
quoted
-----Original Message-----
From: Jun Sun [mailto:jsun@mvista.com]
Sent: Friday, July 18, 2003 10:45 AM
To: Jack Miller
Cc: Linux-Mips; jsun@mvista.com
Subject: Re: kernel BUG at sched.c:784!


On Fri, Jul 18, 2003 at 10:26:59AM -0700, Jack Miller wrote:
quoted
  Jun,
   Thanks for your response.  Our kernel is actually based upon the
MontaVista kernel and that workaround is in place.

  Jack
Which line is 784?  There is another cpu bug which might cause this.

I checked our sherman source tree and did not find the corresponding
line.

Jun
quoted
quoted
-----Original Message-----
From: linux-mips-bounce@linux-mips.org
[mailto:linux-mips-bounce@linux-mips.org]On Behalf Of Jun Sun
Sent: Friday, July 18, 2003 10:23 AM
To: Jack Miller
Cc: Linux-Mips; jsun@mvista.com
Subject: Re: kernel BUG at sched.c:784!



Your kernel looks old, and probably don't have the CPU
bug workaround
quoted
quoted
quoted
quoted
code at the beginning of vec3 exception handler.

NESTED(except_vec3_generic, 0, sp)
#if R5432_CP0_INTERRUPT_WAR
                mfc0    k0, CP0_INDEX
#endif

Try this.

Jun

On Fri, Jul 18, 2003 at 09:57:01AM -0700, Jack Miller wrote:
quoted
  We are developing a system based around a NEC VR5432 CPU
and Broadcom
quoted
quoted
quoted
BCM703X System Controller. When the system is running with
the intended
quoted
quoted
quoted
application and drivers we intermittently experience a kernel
OOPS in the
quoted
scheduler.  Would someone please provide some insight to the
following OOPS
quoted
?  It appears (with my limited understanding of the
scheduler) that the
quoted
quoted
quoted
scheduler is trying to schedule the 'idle' task.  What
condition prevails to
quoted
cause this to happen ?

  Using a J-TAG Debugger, I "walked" the task list (in both
directions) and
quoted
everthing appears to be in order.

  Thanks in advance for your help.

  Regards,
    Jack


Linux version 2.4.17 (jack@saturn) (gcc version 3.2.2
20030322 (Pioneer
quoted
quoted
quoted
Voyager)) #1 Fri May 30 14:55:32 PDT 2003
ksymoops 2.4.6 on mips 2.4.17.  Options used
     -v vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.17/ (default)
     -m System.map (specified)
     -T 32

root@stb2073:~# kernel BUG at sched.c:784!
Unable to handle kernel paging request at virtual address
00000000, epc ==
quoted
8001524c, ra == 8001524c
$0 : 00000000 b001f800 0000001b 00000000 ffffff9d 80008000
0000001f 828f4a20
quoted
$8 : 00000001 ffffd890 00001890 801cb119 00000000 00000000
fffffff9 ffffffff
quoted
$16: 00000000 00000000 809ae000 828f4a20 80008000 00000000
80008000 1001ccf8
quoted
$24: 0000000a 00000002                   809ae000 809afe90
809afe90 8001524c
quoted
epc  : 8001524c    Tainted: P
Using defaults from ksymoops -t elf32-tradbigmips -a mips:3000
Status: b001f803
Cause : 8000c40c
Process pvrd (pid: 331, stackpage=809ae000)
Stack: 8016eda8 8016edc0 00000310 fffffc18 00138f80
00000002 809afed8
quoted
quoted
quoted
00000070
       00000000 1001cd00 1001ccfc 809afec8 80014e74
80014e6c 00000400
quoted
quoted
quoted
00000200
       c008422b 80bd4160 00000000 00000000 00138f80
809ae000 80014dd4
quoted
quoted
quoted
2aac2000
       00000000 809aff18 00001807 7edffa50 8002242c
00000070 00000000
quoted
quoted
quoted
8016c290
       00000000 00000000 00000000 00989680 7edffa40
00000000 8000f7c4
quoted
quoted
quoted
8000f7c4
       00000000 ...
Call Trace: [<8016eda8>] [<8016edc0>] [<80014e74>] [<80014e6c>]
[<c008422b>]
quoted
[<80014dd4>]
 [<8002242c>] [<8016c290>] [<8000f7c4>] [<8000f7c4>]
Code: 24a5edc0  0c0062f7  24060310 <08005485> ae200000
40016000  00000000
quoted
3421001f  3821001e

quoted
quoted
RA;  8001524c <schedule+33c/47c>
$1; b001f800 <_end+2fe2aea0/3fe2a6a0>
$5; 80008000 <init_task_union+0/0>
$7; 828f4a20 <_end+27000c0/3fe2a6a0>
$11; 801cb119 <printk_buf.4+19/400>
$18; 809ae000 <_end+7b96a0/3fe2a6a0>
$19; 828f4a20 <_end+27000c0/3fe2a6a0>
$20; 80008000 <init_task_union+0/0>
$22; 80008000 <init_task_union+0/0>
$23; 1001ccf8 <_binary_ramdisk_gz_size+1001a6da/7fffe9e2>
$28; 809ae000 <_end+7b96a0/3fe2a6a0>
$29; 809afe90 <_end+7bb530/3fe2a6a0>
$30; 809afe90 <_end+7bb530/3fe2a6a0>
$31; 8001524c <schedule+33c/47c>
quoted
quoted
PC;  8001524c <schedule+33c/47c>   <=====
Trace; 8016eda8 <mips_io_port_base+d08/1c30>
Trace; 8016edc0 <mips_io_port_base+d20/1c30>
Trace; 80014e74 <schedule_timeout+74/e4>
Trace; 80014e6c <schedule_timeout+6c/e4>
Trace; c008422b <[bcm7030]scard_interrupt+f/340>
Trace; 80014dd4 <process_timeout+0/2c>
Trace; 8002242c <sys_nanosleep+170/1fc>
Trace; 8016c290 <mips_hwi4_dispatch+70/78>
Trace; 8000f7c4 <stack_done+1c/38>
Trace; 8000f7c4 <stack_done+1c/38>

Code;  80015240 <schedule+330/47c>
00000000 <_PC>:
Code;  80015240 <schedule+330/47c>
   0:   24a5edc0  addiu   a1,a1,-4672
Code;  80015244 <schedule+334/47c>
   4:   0c0062f7  jal     18bdc <_PC+0x18bdc> 8002de1c
<generic_file_direct_IO+294/2d8>
Code;  80015248 <schedule+338/47c>
   8:   24060310  li      a2,784
Code;  8001524c <schedule+33c/47c>   <=====
   c:   08005485  j       15214 <_PC+0x15214> 8002a454
<__vma_link+9c/e0>
quoted
<=====
Code;  80015250 <schedule+340/47c>
  10:   ae200000  sw      zero,0(s1)
Code;  80015254 <schedule+344/47c>
  14:   40016000  mfc0    at,$12
Code;  80015258 <schedule+348/47c>
  18:   00000000  nop
Code;  8001525c <schedule+34c/47c>
  1c:   3421001f  ori     at,at,0x1f
Code;  80015260 <schedule+350/47c>
  20:   3821001e  xori    at,at,0x1e


Jack Miller [off-list ref]
Pioneer Digital Technologies, Inc.
6170 Cornerstone Court East
Suite 330
San Diego, CA 92121-3767
vox: (858)824-0790 x356
fax: (858)824-0796
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help