Re: PowerPC ftrace function trace optimisation
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: 2010-04-29 01:08:28
On Thu, 2010-04-29 at 11:02 +1000, Benjamin Herrenschmidt wrote:
quoted
The option Alan added reduces the footprint to 3 instructions which can be noped out completely. The rest of the function does not rely on the first three instructions. No stack spill is forced either: # gcc -pg -mprofile-kernelquoted
From a quick test it appears that this only works with -m64, not -m32.Alan is that correct ? Any chance you can fix that in future gcc versions ? Also should we implement support for both type of mcounts or just only allow enabling of ftrace with gcc's that support this ?
Also, Anton noticed :
Cheers, Ben.quoted
0000000000000000 <.foo>: 0: 7c 08 02 a6 mflr r0 4: f8 01 00 10 std r0,16(r1)
The std is not useful here. We can do it inside mcount.
quoted
8: 48 00 00 01 bl 8 <.foo+0x8> <--- call to mcount
And I noticed:
quoted
c: 7c 08 02 a6 mflr r0
I'm happy to guarantee that mcount does the above.
quoted
10: f8 01 00 10 std r0,16(r1)
And maybe that one too. However I understand if it's easier not to change the prolog codegen (the 2 insn above) and just stick to adding a 2 or 3 instructions boilerplate at the top. Cheers, Ben.
quoted
14: f8 21 ff d1 stdu r1,-48(r1) 18: e9 22 00 00 ld r9,0(r2) 1c: e8 69 00 02 lwa r3,0(r9) 20: 38 21 00 30 addi r1,r1,48 24: e8 01 00 10 ld r0,16(r1) 28: 7c 08 03 a6 mtlr r0 2c: 4e 80 00 20 blr This mean we could support ftrace function trace with very little overhead. In fact if we are careful when switching to the new mcount ABI and don't rely on the store of r0, we could probably optimise this even further in a future gcc and remove the store completely. mcount would be 2 instructions: mflr r0 bl 8 <.foo+0x8> Anton