Re: [RFC PATCH v2 0/4] arm64: Implement stack trace reliability checks
From: Josh Poimboeuf <hidden>
Date: 2021-04-09 22:32:41
Also in:
linux-arm-kernel, lkml
On Fri, Apr 09, 2021 at 05:05:58PM -0500, Madhavan T. Venkataraman wrote:
quoted
FWIW, over the years we've had zero issues with encoding the frame pointer on x86. After you save pt_regs, you encode the frame pointer to point to it. Ideally in the same macro so it's hard to overlook.I had the same opinion. In fact, in my encoding scheme, I have additional checks to make absolutely sure that it is a true encoding and not stack corruption. The chances of all of those values accidentally matching are, well, null.
Right, stack corruption -- which is already exceedingly rare -- would have to be combined with a miracle or two in order to come out of the whole thing marked as 'reliable' :-) And really, we already take a similar risk today by "trusting" the frame pointer value on the stack to a certain extent.
quoted
quoted
I think there's a lot more code that we cannot unwind, e.g. KVM exception code, or almost anything marked with SYM_CODE_END().Just a reminder that livepatch only unwinds blocked tasks (plus the 'current' task which calls into livepatch). So practically speaking, it doesn't matter whether the 'unreliable' detection has full coverage. The only exceptions which really matter are those which end up calling schedule(), e.g. preemption or page faults. Being able to consistently detect *all* possible unreliable paths would be nice in theory, but it's unnecessary and may not be worth the extra complexity.You do have a point. I tried to think of arch_stack_walk_reliable() as something that should be implemented independent of livepatching. But I could not really come up with a single example of where else it would really be useful. So, if we assume that the reliable stack trace is solely for the purpose of livepatching, I agree with your earlier comments as well.
One thought: if folks really view this as a problem, it might help to just rename things to reduce confusion. For example, instead of calling it 'reliable', we could call it something more precise, like 'klp_reliable', to indicate that its reliable enough for live patching. Then have a comment above 'klp_reliable' and/or stack_trace_save_tsk_klp_reliable() which describes what that means. Hm, for that matter, even without renaming things, a comment above stack_trace_save_tsk_reliable() describing the meaning of "reliable" would be a good idea. -- Josh