Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
From: Yong Zhang <hidden>
Date: 2011-07-04 02:23:27
Also in:
lkml
On Fri, Jul 1, 2011 at 6:03 PM, tiejun.chen [off-list ref] wro= te:
quoted
root@unknown:/root> insmod kprobe_example.ko func=3Dshow_interrupts Planted kprobe at c009be18 root@unknown:/root> cat /proc/interrupts pre_handler: p->addr =3D 0xc009be18, nip =3D 0xc009be18, msr =3D 0x29000 post_handler: p->addr =3D 0xc009be18, msr =3D 0x29000,boostable =3D 1 Oops: Exception in kernel mode, sig: 11 [#1] PREEMPT MPC8536 DS Modules linked in: kprobe_example NIP: df159e74 LR: c0106f40 CTR: c009be18 REGS: df159d90 TRAP: 0700 =C2=A0 Not tainted =C2=A0(3.0.0-rc4-00001-ge8f=
fcca-dirty)
quoted
MSR: 00029000 <EE,ME,CE> =C2=A0CR: 20202688 =C2=A0XER: 00000000 TASK =3D dfaa5340[613] 'cat' THREAD: df158000 GPR00: fffff000 df159e40 dfaa5340 df024a00 df159e78 00000000 df159f20 00=
000001
quoted
GPR08: c10060d0 c009be18 00029000 df159e70 00000000 1001ca74 1ffb5f00 10=
0a01cc
quoted
GPR16: 00000000 00000000 00000000 00000000 df024a28 df159f20 00000000 df=
bff080
quoted
GPR24: 10016000 00001000 df159f20 df159e78 dfbff080 df159e78 df024a00 df=
159e70
quoted
NIP [df159e74] 0xdf159e74 LR [c0106f40] seq_read+0x2a4/0x568 Call Trace: [df159e40] [00029000] 0x29000 (unreliable) [df159e74] [00000000] =C2=A0 (null) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 60026bfc1fe79aed ]--- Segmentation faultMaybe I can understand this problem. When we kprobe these operations such as store-and-update-word for SP(r1), stwu r1, -A(r1) The program exception is triggered then PPC always allocate an exception =
frame
as shown as the follows: old r1 -------- =C2=A0 =C2=A0 =C2=A0 =C2=A0 ... =C2=A0 =C2=A0 =C2=A0 =C2=A0 nip =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpr[2]~gpr[31] =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpr[1] <--------- old r1 is stored here. =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpr[0] =C2=A0 =C2=A0 =C2=A0 -------- <-- pr_regs @offset 16 bytes =C2=A0 =C2=A0 =C2=A0 padding =C2=A0 =C2=A0 =C2=A0 STACK_FRAME_REGS_MARKER =C2=A0 =C2=A0 =C2=A0 LR =C2=A0 =C2=A0 =C2=A0 back chain new r1 -------- Here emulate_step() is called to emulate 'stwu'. Actually this is equival=
ent to
1> update pr_regs->gpr[1] =3D mem(old r1 + (-A)) 2> 'stw <old r1>, mem<(old r1 + (-A)) > You should notice the stack based on new r1 would be covered with mem<old=
r1
+(-A)>. So after this, the kernel exit from post_krpobe, something would =
be
broken. This should depend on sizeof(-A). For kprobe show_interrupts, you can see pregs->nip is re-written violentl=
y so
kernel issued.
Yeah, my debug also show this, so this is the root cause. Thanks for your explanation.
But sometimes we may only re-write some violate registers the kernel stil=
l
alive. And so this is just why the kernel works well for some kprobed poi=
nt
after you change some kernel options/toolchains. If I'm correct its difficult to kprobe these stwu sp operation since the sizeof(-A) is undermined for the kernel. So we have to implement in-depen=
d
interrupt stack like PPC64.
Hmmm, a dedicated exception stack will smooth the concern IMHO, Ben, Kuma? Thanks, Yong --=20 Only stand for myself