--- v1
+++ vrfc
@@ -1,15 +1,11 @@
-This is the patchset of the kprobes jump optimization
+Here are the RFC patchset of the kprobes jump optimization
(a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
-for kernel developers, enhancing the performance of kprobe has
+for kernel developers,enhancing the performance of kprobe has
got much importance.
Currently kprobes inserts a trap instruction to probe a running kernel.
-Jump optimization allows kprobes to replace the trap with a branch,
-reducing the probe overhead drastically.
-
-In this series, conditional branch instructions are not considered for
-optimization as they have to be assessed carefully in SMP systems.
-
+Jump optimization allows kprobes to replace the trap with a branch,reducing
+the probe overhead drastically.
Performance:
=============
@@ -18,12 +14,13 @@
Example:
Placed a probe at an offset 0x50 in _do_fork().
-*Time Diff here is, difference in time before hitting the probe and
-after the probed instruction. mftb() is employed in kernel/fork.c for
-this purpose.
+*Time Diff here is, difference in time before hitting the probe and after the probed instruction.
+mftb() is employed in kernel/fork.c for this purpose.
-# echo 0 > /proc/sys/debug/kprobes-optimization
+
+# echo 0 > /proc/sys/debug/kprobes-optimization
Kprobes globally unoptimized
+
[ 233.607120] Time Diff = 0x1f0
[ 233.608273] Time Diff = 0x1ee
[ 233.609228] Time Diff = 0x203
@@ -45,8 +42,9 @@
[ 233.626358] Time Diff = 0x200
[ 233.627572] Time Diff = 0x1ed
-# echo 1 > /proc/sys/debug/kprobes-optimization
+# echo 1 > /proc/sys/debug/kprobes-optimization
Kprobes globally optimized
+
[ 70.797075] Time Diff = 0x103
[ 70.799102] Time Diff = 0x181
[ 70.801861] Time Diff = 0x15e
@@ -71,61 +69,56 @@
Implementation:
===================
-The trap instruction is replaced by a branch to a detour buffer. To address
-the limitation of branch instruction in power architecture detour buffer
-slot is allocated from a reserved area . This will ensure that the branch
-is within ± 32 MB range. Patch 2/3 furnishes this. The current kprobes
-insn caches allocate memory area for insn slots with module_alloc(). This
-will always be beyond ± 32MB range.
+The trap instruction is replaced by a branch to a detour buffer.
+To address the limitation of branch instruction in power architecture
+detour buffer slot is allocated from a reserved area . This will ensure
+that the branch is within +/- 32 MB range. Patch 2/3 furnishes this.
+The current kprobes insn caches allocate memory area for insn slots
+with module_alloc(). This will always be beyond +/- 32MB range.
+Hence for allocating and freeing slots from this reserved area
+ppc_get_optinsn_slot() and ppc_free_optinsns_slot() are introduced.
The detour buffer contains a call to optimized_callback() which in turn
-call the pre_handler(). Once the pre-handler is run, the original
-instruction is emulated from the detour buffer itself. Also the detour
-buffer is equipped with a branch back to the normal work flow after the
-probed instruction is emulated. Before preparing optimization, Kprobes
-inserts original(breakpoint instruction)kprobe on the specified address.
-So, even if the kprobe is not possible to be optimized, it just uses a
-normal kprobe.
+call the pre_handler(). Once the pre-handler is run, the original instruction
+is emulated from the detour buffer itself. Also the detour buffer is equipped
+with a branch back to the normal work flow after the probed instruction is emulated.
+Before preparing optimization, Kprobes inserts original(user-defined) kprobe on the
+specified address. So, even if the kprobe is not possible to be optimized, it just uses
+a normal kprobe.
Limitations:
==============
-- Number of probes which can be optimized is limited by the size of the
- area reserved.
-- Currently instructions which can be emulated are the only candidates for
- optimization.
-- Conditional branch instructions are not optimized.
-- Probes on kernel module region are not considered for optimization now.
-RFC patchset for optprobes: https://lkml.org/lkml/2016/5/31/375
- https://lkml.org/lkml/2016/5/31/376
- https://lkml.org/lkml/2016/5/31/377
- https://lkml.org/lkml/2016/5/31/378
+- Number of probes which can be optimized is limited by the size of the area reserved.
+
+ * TODO: Have a template based implementation that will alleviate the probe count by
+ using a lesser space from the reserved area for optimization.
-Changes from RFC-v3 :
+- Currently instructions which can be emulated are the only candidates for optimization.
-- Optimization for kporbe(in case of branch instructions) is limited to
- unconditional branch instructions only, since the conditional
- branches are to be assessed carefully in SMP systems.
-- create_return_branch() is omitted.
-- Comments by Masami are addressed.
-
-Anju T Sudhakar (3):
+
+
+Kindly let me know your suggestions and comments.
+
+Thanks
+-Anju
+
+
+Anju T (3):
arch/powerpc : Add detour buffer support for optprobes
arch/powerpc : optprobes for powerpc core
arch/powerpc : Enable optprobes support in powerpc
.../features/debug/optprobes/arch-support.txt | 2 +-
arch/powerpc/Kconfig | 1 +
- arch/powerpc/include/asm/kprobes.h | 24 ++
- arch/powerpc/include/asm/sstep.h | 1 +
+ arch/powerpc/include/asm/kprobes.h | 25 ++
arch/powerpc/kernel/Makefile | 1 +
- arch/powerpc/kernel/optprobes.c | 329 +++++++++++++++++++++
- arch/powerpc/kernel/optprobes_head.S | 119 ++++++++
- arch/powerpc/lib/sstep.c | 21 ++
- 8 files changed, 497 insertions(+), 1 deletion(-)
+ arch/powerpc/kernel/optprobes.c | 463 +++++++++++++++++++++
+ arch/powerpc/kernel/optprobes_head.S | 104 +++++
+ 6 files changed, 595 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/kernel/optprobes.c
create mode 100644 arch/powerpc/kernel/optprobes_head.S
--
-2.7.4
+2.1.0