Thread (176 messages) 176 messages, 29 authors, 2022-05-24

Re: [PATCH 24/30] panic: Refactor the panic path

From: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Date: 2022-05-15 22:48:45
Also in: kexec, linux-alpha, linux-edac, linux-hyperv, linux-leds, linux-mips, linux-pm, linux-remoteproc, linux-s390, linux-tegra, linux-um, lkml, netdev, rcu, sparclinux, xen-devel

On 12/05/2022 11:03, Petr Mladek wrote:
Hello,

first, I am sorry for stepping into the discussion so late.
I was busy with some other stuff and this patchset is far
from trivial.

Second, thanks a lot for putting so much effort into it.
Most of the changes look pretty good, especially all
the fixes of particular notifiers and split into
four lists.

Though this patch will need some more love. See below
for more details.
Thanks a lot for your review Petr, it is much appreciated! No need for
apologies, there is no urgency here =)

[...] 
This talks only about kdump. The reality is much more complicated.
The level affect the order of:

    + notifiers vs. kdump
    + notifiers vs. crash_dump
    + crash_dump vs. kdump
First of all, I'd like to ask you please to clarify to me *exactly* what
are the differences between "crash_dump" and "kdump". I'm sorry if
that's a silly question, I need to be 100% sure I understand the
concepts the same way you do.

There might theoretically many variants of the ordering of kdump,
crash_dump, and the 4 notifier list. Some variants do not make
much sense. You choose 5 variants and tried to select them by
a level number.

The question is if we really could easily describe the meaning this
way. It is not only about a "level" of notifiers before kdump. It is
also about the ordering of crash_dump vs. kdump. IMHO, "level"
semantic does not fit there.

Maybe more parameters might be easier to understand the effect.
Anyway, we first need to agree on the chosen variants.
I am going to discuss it more in the code, see below.


[...] 
Here is the code using the above functions. It helps to discuss
the design and logic.

<kernel/panic.c>
	order_panic_notifiers_and_kdump();

	/* If no level, we should kdump ASAP. */
	if (!panic_notifiers_level)
		__crash_kexec(NULL);

	crash_smp_send_stop();
	panic_notifier_hypervisor_once(buf);

	if (panic_notifier_info_once(buf))
		kmsg_dump(KMSG_DUMP_PANIC);

	panic_notifier_pre_reboot_once(buf);

	__crash_kexec(NULL);

	panic_notifier_hypervisor_once(buf);

	if (panic_notifier_info_once(buf))
		kmsg_dump(KMSG_DUMP_PANIC);

	panic_notifier_pre_reboot_once(buf);
</kernel/panic.c>

I have to say that the logic is very unclear. Almost all
functions are called twice:

   + __crash_kexec()
   + kmsg_dump()
   + panic_notifier_hypervisor_once()
   + panic_notifier_pre_reboot_once()
   + panic_notifier_info_once()

It is pretty hard to find what functions are always called in the same
order and where the order can be inverted.

The really used code path is defined by order_panic_notifiers_and_kdump()
that encodes "level" into "bits". The bits are then flipped in
panic_notifier_*_once() calls that either do something or not.
kmsg_dump() is called according to the bit flip.

It is an interesting approach. I guess that you wanted to avoid too
many if/then/else levels in panic(). But honestly, it looks like
a black magic to me.

IMHO, it is always easier to follow if/then/else logic than using
a translation table that requires additional bit flips when
a value is used more times.

Also I guess that it is good proof that "level" abstraction does
not fit here. Normal levels would not need this kind of magic.
Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
disagree here - I'm much more fond to this kind of code than a bunch of
if/else blocks that almost give headaches. Encoding such "level" logic
in the if/else scheme is very convoluted, generates a very big code. And
the functions aren't so black magic - they map a level in bits, and the
functions _once() are called...once! Although we switch the position in
the code, so there are 2 calls, one of them is called and the other not.

But that's totally fine to change - especially if we're moving away from
the "level" logic. I see below you propose a much simpler approach - if
we follow that, definitely we won't need the "black magic" approach heheh

OK, the question is how to make it better. Let's start with
a clear picture of the problem:

1. panic() has basically two funtions:

      + show/store debug information (optional ways and amount)
      + do something with the system (reboot, stay hanged)


2. There are 4 ways how to show/store the information:

      + tell hypervisor to store what it is interested about
      + crash_dump
      + kmsg_dump()
      + consoles

  , where crash_dump and consoles are special:

     + crash_dump does not return. Instead it ends up with reboot.

     + Consoles work transparently. They just need an extra flush
       before reboot or staying hanged.


3. The various notifiers do things like:

     + tell hypervisor about the crash
     + print more information (also stop watchdogs)
     + prepare system for reboot (touch some interfaces)
     + prepare system for staying hanged (blinking)

   Note that it pretty nicely matches the 4 notifier lists.
I really appreciate the summary skill you have, to convert complex
problems in very clear and concise ideas. Thanks for that, very useful!
I agree with what was summarized above.

Now, we need to decide about the ordering. The main area is how
to store the debug information. Consoles are transparent so
the quesition is about:

     + hypervisor
     + crash_dump
     + kmsg_dump

Some people need none and some people want all. There is a
risk that system might hung at any stage. This why people want to
make the order configurable.

But crash_dump() does not return when it succeeds. And kmsg_dump()
users havn't complained about hypervisor problems yet. So, that
two variants might be enough:

    + crash_dump (hypervisor, kmsg_dump as fallback)
    + hypervisor, kmsg_dump, crash_dump

One option "panic_prefer_crash_dump" should be enough.
And the code might look like:

void panic()
{
[...]
	dump_stack();
	kgdb_panic(buf);

	< ---  here starts the reworked code --- >

	/* crash dump is enough when enabled and preferred. */
	if (panic_prefer_crash_dump)
		__crash_kexec(NULL);

	/* Stop other CPUs and focus on handling the panic state. */
	if (has_kexec_crash_image)
		crash_smp_send_stop();
	else
		smp_send_stop()
Here we have a very important point. Why do we need 2 variants of SMP
CPU stopping functions? I disagree with that - my understanding of this
after some study in architectures is that the crash_() variant is
"stronger", should work in all cases and if not, we should fix that -
that'd be a bug.

Such variant either maps to smp_send_stop() (in various architectures,
including XEN/x86) or overrides the basic function with more proper
handling for panic() case...I don't see why we still need such
distinction, if you / others have some insight about that, I'd like to
hear =)

	/* Notify hypervisor about the system panic. */
	atomic_notifier_call_chain(&panic_hypervisor_list, 0, NULL);

	/*
	 * No need to risk extra info when there is no kmsg dumper
	 * registered.
	 */
	if (!has_kmsg_dumper())
		__crash_kexec(NULL);

	/* Add extra info from different subsystems. */
	atomic_notifier_call_chain(&panic_info_list, 0, NULL);

	kmsg_dump(KMSG_DUMP_PANIC);
	__crash_kexec(NULL);

	/* Flush console */
	unblank_screen();
	console_unblank();
	debug_locks_off();
	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

	if (panic_timeout > 0) {
		delay()
	}

	/*
	 * Prepare system for eventual reboot and allow custom
	 * reboot handling.
	 */
	atomic_notifier_call_chain(&panic_reboot_list, 0, NULL);
You had the order of panic_reboot_list VS. consoles flushing inverted.
It might make sense, although I didn't do that in V1...
Are you OK in having a helper for console flushing, as I did in V1? It
makes code of panic() a bit less polluted / more focused I feel.

	if (panic_timeout != 0) {
		reboot();
	}

	/*
	 * Prepare system for the infinite waiting, for example,
	 * setup blinking.
	 */
	atomic_notifier_call_chain(&panic_loop_list, 0, NULL);

	infinite_loop();
}


__crash_kexec() is there 3 times but otherwise the code looks
quite straight forward.

Note 1: I renamed the two last notifier list. The name 'post-reboot'
	did sound strange from the logical POV ;-)

Note 2: We have to avoid the possibility to call "reboot" list
	before kmsg_dump(). All callbacks providing info
	have to be in the info list. It a callback combines
	info and reboot functionality then it should be split.

	There must be another way to calm down problematic
	info callbacks. And it has to be solved when such
	a problem is reported. Is there any known issue, please?

It is possible that I have missed something important.
But I would really like to make the logic as simple as possible.
OK, I agree with you! It's indeed simpler and if others agree, I can
happily change the logic to what you proposed. Although...currently the
"crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
callbacks _before kdump_.

We need to mention this change in the commit messages, but I really
would like to hear the opinions of heavy users of notifiers (as
Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
Young / Hayatama). If we all agree on such approach, will change that
for V2 =)

Thanks again Petr, for the time spent in such detailed review!
Cheers,


Guilherme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help