Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking
From: Andy Lutomirski <luto@amacapital.net>
Date: 2016-05-24 03:52:35
Also in:
linux-s390, lkml
On May 23, 2016 7:28 PM, "Josh Poimboeuf" [off-list ref] wrote:
quoted
Maybe I'm coming around to liking this idea.Ok, good :-)quoted
In an ideal world (DWARF support, high-quality unwinder, nice pretty printer, etc), unwinding across a kernel exception would look like: - some_func - some_other_func - do_page_fault - page_fault After page_fault, the next unwind step takes us to the faulting RIP (faulting_func) and reports that all GPRs are known. It should probably learn this fact from DWARF if DWARF is available, instead of directly decoding pt_regs, due to a few funny cases in which pt_regs may be incomplete. A nice pretty printer could now print all the regs. - faulting_func - etc. For this to work, we don't actually need the unwinder to explicitly know where pt_regs is.That's true (but only for DWARF).quoted
Food for thought, though: if user code does SYSENTER with TF set, you'll end up with partial pt_regs. There's nothing the kernel can do about it. DWARF will handle it without any fanfare, but I don't know if it's going to cause trouble for you pre-DWARF.In this case it should see the stack pointer is past the pt_regs offset, so it would just report it as an empty stack.
OK
quoted
I'm also not sure it makes sense to apply this before the unwinder that can consume it is ready. Maybe, if it would be consistent with your plans, it would make sense to rewrite the unwinder first, then apply this and teach live patching to use the new unwinder, and *then* add DWARF support?For the purposes of livepatch, the reliable unwinder needs to detect whether an interrupt/exception pt_regs frame exists on a sleeping task (or current). This patch would allow us to do that. So my preferred order of doing things would be: 1) Brian Gerst's switch_to() cleanup and any related unwinder fixes 2) this patch for annotating pt_regs stack frames 3) reliable unwinder, similar to what I already posted, except it relies on this patch instead of PF_PREEMPT_IRQ, and knows how to deal with the new inactive task frame format of #1 4) livepatch consistency model which uses the reliable unwinder 5) rewrite unwinder, and port all users to the new interface 6) add DWARF unwinder 1-4 are pretty much already written, whereas 5 and 6 will take considerably more work.
Fair enough.
quoted
quoted
+ /* + * Create a stack frame for the saved pt_regs. This allows frame + * pointer based unwinders to find pt_regs on the stack. + */ + .macro CREATE_PT_REGS_FRAME regs=%rsp +#ifdef CONFIG_FRAME_POINTER + pushq \regs + pushq $pt_regs+1Why the +1?Some unwinders like gdb are smart enough to report the function which contains the instruction *before* the return address. Without the +1, they would show the wrong function.
Lovely. Want to add a comment?
quoted
quoted
+ pushq %rbp + movq %rsp, %rbp +#endif + .endmI keep wanting this to be only two pushes and to fudge rbp to make it work, but I don't see a good way. But let's call it CREATE_NESTED_ENTRY_FRAME or something, and let's rename pt_regs to nested_frame or similar.Or, if we aren't going to annotate syscall pt_regs, we could give it a more specific name. CREATE_INTERRUPT_FRAME and interrupt_frame()?
CREATE_INTERRUPT_FRAME is confusing because it applies to idtentry, too. CREATE_PT_REGS_FRAME is probably fine.
quoted
quoted
+ +/* fake function which allows stack unwinders to detect pt_regs frames */ +#ifdef CONFIG_FRAME_POINTER +ENTRY(pt_regs) + nop + nop +ENDPROC(pt_regs) +#endif /* CONFIG_FRAME_POINTER */Why is this two bytes long? Is there some reason it has to be more than one byte?Similar to above, this is related to the need to support various unwinders. Whether the unwinder displays the ret_addr or the instruction preceding it, either way the instruction needs to be inside the function for the function to be reported.
OK. --Andy