Thread (18 messages) 18 messages, 4 authors, 2020-06-23

Re: kprobe: __blkdev_put probe is missed

From: Ming Lei <hidden>
Date: 2020-06-22 00:28:12
Also in: lkml

Hi Masami,

On Sat, Jun 20, 2020 at 10:37:47AM +0900, Masami Hiramatsu wrote:
Hi Ming,

On Sat, 20 Jun 2020 07:28:20 +0800
Ming Lei [off-list ref] wrote:
quoted
quoted
Ah, after all it is as expected. With your kconfig, the kernel is
very agressively optimized.

$ objdump -dS vmlinux | less
...
ffffffff81256dc3 <__blkdev_put>:
{
ffffffff81256dc3:       e8 98 85 df ff          callq  ffffffff8104f360 <__fentry__>
ffffffff81256dc8:       41 57                   push   %r15
ffffffff81256dca:       41 56                   push   %r14
ffffffff81256dcc:       41 55                   push   %r13
...
ffffffff81256f05:       75 02                   jne    ffffffff81256f09 <__blkdev_put+0x146>
        struct block_device *victim = NULL;
ffffffff81256f07:       31 db                   xor    %ebx,%ebx
                bdev->bd_contains = NULL;
ffffffff81256f09:       48 c7 45 60 00 00 00    movq   $0x0,0x60(%rbp)
ffffffff81256f10:       00 
                put_disk_and_module(disk);
ffffffff81256f11:       4c 89 f7                mov    %r14,%rdi
ffffffff81256f14:       e8 c6 3d 11 00          callq  ffffffff8136acdf <put_disk_and_module>
        mutex_unlock(&bdev->bd_mutex);
ffffffff81256f19:       4c 89 ff                mov    %r15,%rdi
                __blkdev_put(victim, mode, 1);
ffffffff81256f1c:       41 bc 01 00 00 00       mov    $0x1,%r12d
        mutex_unlock(&bdev->bd_mutex);
ffffffff81256f22:       e8 8d d7 48 00          callq  ffffffff816e46b4 <mutex_unlock>
        bdput(bdev);
ffffffff81256f27:       48 89 ef                mov    %rbp,%rdi
ffffffff81256f2a:       e8 f0 e9 ff ff          callq  ffffffff8125591f <bdput>
        if (victim)
ffffffff81256f2f:       48 85 db                test   %rbx,%rbx
ffffffff81256f32:       74 08                   je     ffffffff81256f3c <__blkdev_put+0x179>
ffffffff81256f34:       48 89 dd                mov    %rbx,%rbp
ffffffff81256f37:       e9 b4 fe ff ff          jmpq   ffffffff81256df0 <__blkdev_put+0x2d> <<-----THIS!!
}
ffffffff81256f3c:       48 8b 44 24 28          mov    0x28(%rsp),%rax
ffffffff81256f41:       65 48 33 04 25 28 00    xor    %gs:0x28,%rax
ffffffff81256f48:       00 00 
ffffffff81256f4a:       74 05                   je     ffffffff81256f51 <__blkdev_put+0x18e>
ffffffff81256f4c:       e8 5a 4e 48 00          callq  ffffffff816dbdab <__stack_chk_fail>
ffffffff81256f51:       48 83 c4 30             add    $0x30,%rsp
ffffffff81256f55:       5b                      pop    %rbx
ffffffff81256f56:       5d                      pop    %rbp
ffffffff81256f57:       41 5c                   pop    %r12
ffffffff81256f59:       41 5d                   pop    %r13
ffffffff81256f5b:       41 5e                   pop    %r14
ffffffff81256f5d:       41 5f                   pop    %r15
ffffffff81256f5f:       c3                      retq   


As you can see, the nested __blkdev_put() is coverted to a loop.
If you put kprobe on __blkdev_put+0x2d, you'll see the event twice.
Thanks for your investigation.

Some trace tools can just trace on function entry, such as bcc, and some
user script always trace on function entry.

I guess the issue should belong to kprobe implementation:

1) __blkdev_put() is capable of being kprobed, so from user view, the
probe on entry of __blkdev_put() should be triggered
Yes, it is correctly triggered.
I mean it isn't from user's viewpoint, and the binary code is usually a
black box for final kprobe user.

IMO, all your and Steven's input are just from kprobe/trace developer's viewpoint.
Can you think about the issue from kprobe real/final user?

Trace is very useful tools to observe system internal, and people often
relies on trace to understand system. However, missed probe often causes
trouble for us to understand the system correctly.
quoted
2) from implementation view, I understand exception should be trapped
on the entry of __blkdev_put(), looks it isn't done.
No, it is correctly trapped the function entry address. The problem is
that the gcc optimized the nested function call into jump to the
beginning of function body (skip prologue).

Usually, a function is compiled as below

func()     (1) the entry address (func:)
{          (2) the function prologue (setup stackframe)  
  int a    (3) the beginning of function body 
   ...
  func()   (4) the nested function call

And in this case, the gcc optimized (4) into jump to (3) instead of
actual function call instruction. Thus, for the nested case (1) and
(2) are skipped.
 IOW, the code flow becomes
  (1)->(2)->(3)->(4)->(3)
 instead of 
  (1)->(2)->(3)->(4)->(1)->(2)->(3)

In this case, if we put a probe on (1) or (2), those are disappeared
in the nested call. Thus if you put a probe on (3) ('perf probe __blkdev_put:2')
you'll see the event twice.
Thanks for your explanation.

Can you kprobe guys improve the implementation for covering this case?
For example, put probe on 3) in case the above situation is recognized.



Thanks,
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help