Thread (21 messages) 21 messages, 7 authors, 2025-03-06

Re: [EXTERNAL] Re: [PATCH v4 0/2] Add stop_on_panic support for watchdog

From: Ahmad Fatoum <a.fatoum@pengutronix.de>
Date: 2025-03-05 11:40:24
Also in: chrome-platform, imx, linux-arm-kernel, linux-mips, linux-watchdog, lkml, openbmc

Hi George,

On 05.03.25 12:28, George Cherian wrote:
Hi Ahmad,
quoted
Hi George,
On 05.03.25 11:10, George Cherian wrote:
quoted
This series adds a new kernel command line option to watchdog core to
stop the watchdog on panic. This is useul in certain systems which prevents
successful loading of kdump kernel due to watchdog reset.

Some of the watchdog drivers stop function could sleep. For such
drivers the stop_on_panic is not valid as the notifier callback happens
in atomic context. Introduce WDIOF_STOP_MAYSLEEP flag to watchdog_info
options to indicate whether the stop function would sleep.
Did you consider having a reset_on_panic instead, which sets a user-specified
timeout on panic? This would make the mechanism useful also for watchdogs
/proc/sys/kernel/panic already provides that support. You may echo a non-zero value 
and the system tries for a soft reboot after those many seconds. But this doesn't happen 
in case of a kdump kernel load after panic.
The timeout specified to the Watchdog reset_on_panic option would be programmed into
the active watchdogs and not be used to trigger a software-induced reboot.
quoted
that can't be disabled and would protect against system lock up: 
Consider a memory-corruption bug (perhaps externally via DMA), which partially
overwrites both main and kdump kernel. With a disabled watchdog, the system
may not be able to recover on its own.
Yes, that is the reason why the kernel command-line is optional and by default it is set to zero.
So that in cases if you have a corrupted kdump kernel then watchdog kicks in.
The existing option isn't enough for the kdump kernel use case.
If we (i.e. you) are going to do something about it, wouldn't it be
better to have a solution that's applicable to a wider number of
watchdog devices?
quoted
If you did consider it, what made you decide against it?
watchdog.stop_on_panic=1 is specifically for systems which can't boot a kdump kernel due to the fact 
that the kdump kernel gets a watchdog reset while booting, may be due to a shorter watchdog time.
For eg: a 32-bit watchdog down counter running at 1GHz.
reset_on_panic can guarantee only the largest watchdog timeout supported by HW, 
since there is no one to ping the watchdog.
If you are serious with the watchdog use, you'll want to use the watchdog to
monitor kernel startup as well. If the bootloader can set a watchdog timeout
just before starting the kernel and it doesn't expire before the kernel watchdog
driver takes over, why can't we do the same just before starting the dumpkernel?

Thanks,
Ahmad

 
quoted
Thanks,
Ahmad
quoted
Changelog:
v1 -> v2
- Remove the per driver flag setting option
- Take the parameter via kernel command-line parameter to watchdog_core.

v2 -> v3
- Remove the helper function watchdog_stop_on_panic() from watchdog.h.
- There are no users for this. 

v3 -> v4
- Since the panic notifier is in atomic context, watchdog functions
  which sleep can't be called. 
- Add an options flag WDIOF_STOP_MAYSLEEP to indicate whether stop
  function sleeps.
- Simplify the stop_on_panic kernel command line parsing.
- Enable the panic notiffier only if the watchdog stop function doesn't
  sleep

George Cherian (2):
  watchdog: Add a new flag WDIOF_STOP_MAYSLEEP
  drivers: watchdog: Add support for panic notifier callback
- George

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help