Thread (17 messages) 17 messages, 3 authors, 2020-05-31

Re: [arm64, debug] PTRACE_SINGLESTEP does not single-step a valid instruction

From: Luis Machado <hidden>
Date: 2020-01-13 18:13:36

Ping?

On 11/26/19 1:35 PM, Luis Machado wrote:
ping?

On 11/18/19 11:54 AM, Luis Machado wrote:
quoted
Hi Will,

Thanks for the thorough explanation.

On 11/18/19 10:15 AM, Will Deacon wrote:
quoted
Hi Luis,

[+Mark for the valid_user_regs() part]

On Tue, Nov 12, 2019 at 08:22:10PM -0300, Luis Machado wrote:
quoted
I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
request by GDB won't execute the underlying instruction. As a 
consequence,
the PC doesn't move, but we return a SIGTRAP just like we would for a
regular successful PTRACE_SINGLESTEP request.

Since there are no software breakpoints inserted at PC (we are actually
stepping over a breakpoint, so GDB removes the breakpoint at PC before
issuing a PTRACE_SINGLESTEP request), this is an odd behavior.

Though not too harmful, i see this manifesting in the GDB testsuite
(gdb.reverse/insn-reverse.exp), which throws the test off by making GDB
think it is further in the instruction stream than it really is. In 
fact, we
get lucky here and no FAIL's show up, only many more spurious PASSes.
I managed to reproduce this locally and I think I've figured out what's
going on, although I'm not sure that the kernel is the best place to fix
it.

Looking at the specific reproducer:
quoted
Execute gdb like so:

gdb -ex "set displaced-stepping off" -ex "b load" -ex "run" -ex 
"record" -ex
"si" -ex "rsi" -ex "record stop" insn-reverse
So we've got a couple of instructions as follows (it doesn't actually 
matter
what they are, so I've changed the LD1 in your binary for a NOP in 
order to
avoid confusion with the "load" label not actually pointing at a load):

    0x7b8:        mov    // "load"
    0x7bc:        nop

"b load" places a breakpoint at 0x7b8:

    ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0

We run to a software breakpoint on "load" (the mov instruction). We take
the trap and try to execute the "si", which means we need to remove the
breakpoint while we step over it:

    ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0
    [...]
    ptrace(PTRACE_SINGLESTEP, 662, 0x1, 0)  = 0

This causes the kernel to arm the single-step state machine so that
MDSCR_EL1.SS == SPSR_EL1.SS == 1 (known as "active-not-pending"). 
Running
an instruction in userspace will transition to MDSCR_EL1.SS ==1 and
SPSR_EL1.SS == 0 (known as "active-pending"), which will cause the 
trap to
trigger, at which point gdb puts the breakpoint instruction back 
since the
step is complete:
So, just to confirm my understanding, we have a couple bits 
controlling single-stepping in the kernel, one in MDSCR_EL1 and 
another in SPSR_EL1. GDB doesn't have direct access to any of those, 
correct?

Instead, GDB has access to a SS bit in the reserved 21~22 range of CPSR.

The transition from active-not-pending to active-pending takes place 
via a single PTRACE_SINGLESTEP request? Is that correct?
quoted
    ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201fd4200000) = 0

This is where things start to go wrong. The "rsi" command attempts to
perform a reverse step, which means restoring the old state when we were
previously executing at 0x7b8. It starts by removing the breakpoint 
again,
since we've already hit that:

    ptrace(PTRACE_POKEDATA, 662, 0xaaaaaaaaa7b8, 0xd503201f910003e0) = 0

and then resets the CPU registers to their old values:

    (I don't know why it does this three times)
    ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
[{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
    ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
[{iov_base=0xffffff64b3c8, iov_len=272}]) = 0
    ptrace(PTRACE_SETREGSET, 662, NT_PRSTATUS, 
[{iov_base=0xffffff64b3c8, iov_len=272}]) = 0

The problem with this is that we have moved the PC back to 0x7b8 but 
we have
also cleared SPSR_EL1.SS to 0. Internally, the kernel hasn't seen 
stepping
get disabled (this usually happens by PTRACE_CONT calling
user_disable_single_step()) which means that MDSCR_EL1.SS remains set 
to 1
and we're in the active-pending state! Consequently, we immediately 
take a
step exception if a step operation is attempted >
While trying to reproduce this, i was paying attention to the SS bit 
coming and going. But in the particular sequence of si/rsi, within the 
record boundaries, i see GDB just restored the original CPSR value to 
what it was before we processed the si command.

 From GDB's POV all state was restore to the way it was before and 
we're good to go.

Is this not enough to restore state kernel-wise?
quoted
Now, we *could* consider hacking the TIF_SINGLESTEP check in
valid_user_regs() so that SPSR_EL1.SS is preserved when stepping is 
active
but this is a user-visible change and may break things like stepping 
out of
signal handlers. I would prefer that GDB manages the SS bit 
explicitly in
this scenario, by setting it to 1 when restoring the old state in the
reverse step, a bit like when it disables the old breakpoint. You can
emulate this by doing:
I think we could let GDB control this when required, but I'm trying to 
understand the ramifications of letting GDB do so.

For example, what if the user decides to alter the PC here and there, 
for debugging purposes. That is a use case that happens often, in 
order to go back or skip some parts of the code.

Would we need to pay attention to the SS bit in those cases as well?
quoted
    (gdb) set $cpsr |= (1<<21)
In particular, what does the switching of this bit accomplishes in the 
kernel? Would we be better off forcing the SS bit every time we do a 
single-step operation, for example?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help