Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

[RFC PATCH v2 00/18] livepatch: hybrid consistency model · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 02/18] x86/asm/head: use a common function for starting CPUs · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Jiri Kosina <jikos@kernel.org> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-30
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-04-30
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Ingo Molnar <mingo@kernel.org> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Jiri Kosina <jikos@kernel.org> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Jiri Kosina <jikos@kernel.org> · 2016-05-02
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-03
RE: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · David Laight <hidden> · 2016-05-04
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-19
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-19
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-20
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-20
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-20
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-20
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-20
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Jiri Kosina <jikos@kernel.org> · 2016-05-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-24
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-05-24
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-05-24
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-22
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Andy Lutomirski <luto@amacapital.net> · 2016-06-23
Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking · Josh Poimboeuf <hidden> · 2016-06-23
[RFC PATCH v2 08/18] livepatch: temporary stubs for klp_patch_pending() and klp_patch_task() · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 10/18] livepatch/powerpc: add TIF_PATCH_PENDING thread flag · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 10/18] livepatch/powerpc: add TIF_PATCH_PENDING thread flag · Petr Mladek <pmladek@suse.com> · 2016-05-03
Re: [RFC PATCH v2 10/18] livepatch/powerpc: add TIF_PATCH_PENDING thread flag · Miroslav Benes <mbenes@suse.cz> · 2016-05-03
[RFC PATCH v2 12/18] livepatch/s390: add TIF_PATCH_PENDING thread flag · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 14/18] livepatch: remove unnecessary object loaded check · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 18/18] livepatch: add /proc/<pid>/patch_state · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 13/18] livepatch: separate enabled and patched states · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 13/18] livepatch: separate enabled and patched states · Petr Mladek <pmladek@suse.com> · 2016-05-03
Re: [RFC PATCH v2 13/18] livepatch: separate enabled and patched states · Josh Poimboeuf <hidden> · 2016-05-03
[RFC PATCH v2 16/18] livepatch: store function sizes · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-04
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-04
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Miroslav Benes <mbenes@suse.cz> · 2016-05-05
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-05
barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Peter Zijlstra <peterz@infradead.org> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-05
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Miroslav Benes <mbenes@suse.cz> · 2016-05-09
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-04
Re: barriers: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-05
klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-04
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Jiri Kosina <jikos@kernel.org> · 2016-05-04
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-04
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-05
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-06
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-09
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-16
Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-18
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-05-06
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-06
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Miroslav Benes <mbenes@suse.cz> · 2016-05-09
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-16
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Miroslav Benes <mbenes@suse.cz> · 2016-05-10
Re: livepatch: change to a per-task consistency model · Jessica Yu <hidden> · 2016-05-17
Re: livepatch: change to a per-task consistency model · Jiri Kosina <jikos@kernel.org> · 2016-05-18
Re: livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-05-18
Re: livepatch: change to a per-task consistency model · Jiri Kosina <jikos@kernel.org> · 2016-05-18
RE: livepatch: change to a per-task consistency model · David Laight <hidden> · 2016-05-23
RE: livepatch: change to a per-task consistency model · Jiri Kosina <jikos@kernel.org> · 2016-05-23
RE: livepatch: change to a per-task consistency model · David Laight <hidden> · 2016-05-24
RE: livepatch: change to a per-task consistency model · Jiri Kosina <jikos@kernel.org> · 2016-05-24
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Petr Mladek <pmladek@suse.com> · 2016-06-06
Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model · Josh Poimboeuf <hidden> · 2016-06-06
[RFC PATCH v2 15/18] livepatch: move patching functions into patch.c · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 15/18] livepatch: move patching functions into patch.c · Petr Mladek <pmladek@suse.com> · 2016-05-03
[RFC PATCH v2 11/18] livepatch/s390: reorganize TIF thread flag bits · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 09/18] livepatch/x86: add TIF_PATCH_PENDING thread flag · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 09/18] livepatch/x86: add TIF_PATCH_PENDING thread flag · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 09/18] livepatch/x86: add TIF_PATCH_PENDING thread flag · Josh Poimboeuf <hidden> · 2016-04-29
[RFC PATCH v2 07/18] stacktrace/x86: function for detecting reliable stack traces · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 06/18] x86: dump_trace() error handling · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 06/18] x86: dump_trace() error handling · Minfei Huang <hidden> · 2016-04-29
Re: [RFC PATCH v2 06/18] x86: dump_trace() error handling · Josh Poimboeuf <hidden> · 2016-04-29
[RFC PATCH v2 04/18] x86: move _stext marker before head code · Josh Poimboeuf <hidden> · 2016-04-28
[RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Josh Poimboeuf <hidden> · 2016-04-28
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Brian Gerst <hidden> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Andy Lutomirski <luto@kernel.org> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Andy Lutomirski <luto@amacapital.net> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Josh Poimboeuf <hidden> · 2016-04-29
Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks · Andy Lutomirski <luto@amacapital.net> · 2016-04-30
[RFC PATCH v2 01/18] x86/asm/head: clean up initial stack variable · Josh Poimboeuf <hidden> · 2016-04-28

From: Andy Lutomirski <luto@amacapital.net>
Date: 2016-06-22 17:59:46
Also in: linux-s390, lkml

On Wed, Jun 22, 2016 at 9:30 AM, Josh Poimboeuf [off-list ref] wrote:

On Mon, May 23, 2016 at 08:52:12PM -0700, Andy Lutomirski wrote:

quoted

On May 23, 2016 7:28 PM, "Josh Poimboeuf" [off-list ref] wrote:

quoted

Maybe I'm coming around to liking this idea.

Ok, good :-)

quoted

In an ideal world (DWARF support, high-quality unwinder, nice pretty
printer, etc), unwinding across a kernel exception would look like:

 - some_func
 - some_other_func
 - do_page_fault
 - page_fault

After page_fault, the next unwind step takes us to the faulting RIP
(faulting_func) and reports that all GPRs are known.  It should
probably learn this fact from DWARF if DWARF is available, instead of
directly decoding pt_regs, due to a few funny cases in which pt_regs
may be incomplete.  A nice pretty printer could now print all the
regs.

 - faulting_func
 - etc.

For this to work, we don't actually need the unwinder to explicitly
know where pt_regs is.

That's true (but only for DWARF).

quoted

Food for thought, though: if user code does SYSENTER with TF set,
you'll end up with partial pt_regs.  There's nothing the kernel can do
about it.  DWARF will handle it without any fanfare, but I don't know
if it's going to cause trouble for you pre-DWARF.

In this case it should see the stack pointer is past the pt_regs offset,
so it would just report it as an empty stack.

OK

quoted

I'm also not sure it makes sense to apply this before the unwinder
that can consume it is ready.  Maybe, if it would be consistent with
your plans, it would make sense to rewrite the unwinder first, then
apply this and teach live patching to use the new unwinder, and *then*
add DWARF support?

For the purposes of livepatch, the reliable unwinder needs to detect
whether an interrupt/exception pt_regs frame exists on a sleeping task
(or current).  This patch would allow us to do that.

So my preferred order of doing things would be:

1) Brian Gerst's switch_to() cleanup and any related unwinder fixes
2) this patch for annotating pt_regs stack frames
3) reliable unwinder, similar to what I already posted, except it relies
   on this patch instead of PF_PREEMPT_IRQ, and knows how to deal with
   the new inactive task frame format of #1
4) livepatch consistency model which uses the reliable unwinder
5) rewrite unwinder, and port all users to the new interface
6) add DWARF unwinder

1-4 are pretty much already written, whereas 5 and 6 will take
considerably more work.

Fair enough.

quoted

+       /*
+        * Create a stack frame for the saved pt_regs.  This allows frame
+        * pointer based unwinders to find pt_regs on the stack.
+        */
+       .macro CREATE_PT_REGS_FRAME regs=%rsp
+#ifdef CONFIG_FRAME_POINTER
+       pushq   \regs
+       pushq   $pt_regs+1

Why the +1?

Some unwinders like gdb are smart enough to report the function which
contains the instruction *before* the return address.  Without the +1,
they would show the wrong function.

Lovely.  Want to add a comment?

quoted

+       pushq   %rbp
+       movq    %rsp, %rbp
+#endif
+       .endm

I keep wanting this to be only two pushes and to fudge rbp to make it
work, but I don't see a good way.  But let's call it
CREATE_NESTED_ENTRY_FRAME or something, and let's rename pt_regs to
nested_frame or similar.

Or, if we aren't going to annotate syscall pt_regs, we could give it a
more specific name.  CREATE_INTERRUPT_FRAME and interrupt_frame()?

CREATE_INTERRUPT_FRAME is confusing because it applies to idtentry,
too.  CREATE_PT_REGS_FRAME is probably fine.

quoted

+
+/* fake function which allows stack unwinders to detect pt_regs frames */
+#ifdef CONFIG_FRAME_POINTER
+ENTRY(pt_regs)
+       nop
+       nop
+ENDPROC(pt_regs)
+#endif /* CONFIG_FRAME_POINTER */

Why is this two bytes long?  Is there some reason it has to be more
than one byte?

Similar to above, this is related to the need to support various
unwinders.  Whether the unwinder displays the ret_addr or the
instruction preceding it, either way the instruction needs to be inside
the function for the function to be reported.

OK.

Andy,

So I got a chance to look at this some more.  I'm thinking that to make
this feature more consistently useful, we shouldn't only annotate
pt_regs frames for calls to handlers; other calls should be annotated as
well: preempt_schedule_irq, CALL_enter_from_user_mode,
prepare_exit_to_usermode, SWAPGS, TRACE_IRQS_OFF, DISABLE_INTERRUPTS,
etc.  That way, the unwinder will always be able to find pt_regs from an
interrupt/exception, even if starting from one of these other calls.

But then, things get ugly.  You have to either setup and tear down the
frame for every possible call, or do a higher-level setup/teardown
across multiple calls, which invalidates several assumptions in the
entry code about the location of pt_regs on the stack.

Also problematic is that several of the macros (like TRACE_IRQS_IRETQ)
make assumptions about the location of pt_regs.  And they're used by
both syscall and interrupt code.  So if we didn't create a frame pointer
header for syscalls, we'd basically need two versions of the macros: one
for irqs/exceptions and one for syscalls.

So I think the cleanest way to handle this is to always allocate two
extra registers on the stack in ALLOC_PT_GPREGS_ON_STACK.  Then all
entry code can assume that pt_regs is at a constant location, and all
the above problems go away.  Another benefit is that we'd only need two
saves instead of three -- the pointer to pt_regs is no longer needed
since pt_regs is always immediately after the frame header.

I worked up a patch to implement this -- see below.  It writes the frame
pointer in all entry paths, including syscalls.  This helps keep the
code simple.

The downside is a small performance penalty: with getppid()-in-a-loop on
my laptop, the average syscall went from 52ns to 53ns, which is about a
2% slowdown.  But I doubt it would be measurable in a real-world
workload.

It looks like about half the slowdown is due to the extra stack
allocation (which presumably adds a little d-cache pressure on the stack
memory) and the other half is due to the stack writes.

I could remove the writes from the syscall path but it would only save
about half a ns, and it would make the code less robust.  Plus it's nice
to have the consistency of having *all* pt_regs frames annotated.

This is a bit messy, and I'm not really sure that the entry code
should be have to operate under constraints like this.  Also,
convincing myself this works for NMI sounds unpleasant.

Maybe we should go back to my idea of just listing the call sites in a table.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help