Re: [PATCH v5] power: suspend: Move dpm_watchdog to suspend.c and enhance it
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: 2021-01-27 15:24:23
Also in:
lkml
On Fri, Jan 08, 2021 at 01:51:11PM +0800, Joseph Jang wrote:
quoted hunk ↗ jump to hunk
Since dpm_watchdog just cover two functions __device_suspend() and device_resume(), we proposed to move it to core power suspend.c to extend its coverage and monitor more devices suspend hand issues. We propose to use new name suspend watchdog and new timeout handler to cover more sleep hang issues. The new timeout handler will dump disk sleep task call trace at first round timeout and trigger kernel panic at second round timeout. The default timer for each round is defined in CONFIG_PM_SUSPEND_WATCHDOG_TIMEOUT. Signed-off-by: Joseph Jang <redacted> --- Changes since v4: - Change #define PM_SUSPEND_WATCHDOG to CONFIG_PM_SUSPEND_WATCHDOG in suspend_watchdog.c to make sure we compile all suspend watchdog functions. - Add suspend watchdog functions declared in suspend_watchdog.h to prevent compile errors. Changes since v3: - Change the naming from sleep timer to suspend watchdog. - Remove console_is_suspended() from console.h and printk.c - Add static declaration for suspend_watchdog struct since we use it in suspend.c only. - Move suspend watchdog real logic to suspend_watchdog.c. - Put the function prototypes in suspend_watchdog.h - Use inline functions for function prototypes definition. Changes since v2: - Add kernel/power/suspend_timer.h in MAINTAINERS - Share some corner cases that dpm_watchdog cannot cover. Case#A: dpm_watchdog cannot cover suspend hang in suspend_enter(). File: kernel/power/suspend.c int suspend_devices_and_enter(suspend_state_t state) { ... <SNIP> suspend_test_start(); error = dpm_suspend_start(PMSG_SUSPEND); <== dpm_watchdog will be enabled/disabled in this fucntion ... if (error) { pr_err("Some devices failed to suspend, or early wake event detected\n"); goto Recover_platform; } suspend_test_finish("suspend devices"); if (suspend_test(TEST_DEVICES)) goto Recover_platform; do { error = suspend_enter(state, &wakeup); <== but we encounter hang inside function suspend_enter() ... } while (!error && !wakeup && platform_suspend_again(state)); Case#B: dpm_watchdog cannot cover resume hang in dpm_resume_early() because it enable/disable in device_resume(). Call trace: __switch_to+0x174/0x194 __schedule+0xa60/0x1464 __cancel_work_timer+0x120/0x234 chg_pm_resume+0x2c/0xd8 dpm_run_callback+0x27c/0x624 device_resume_early+0x1e4/0x1f8 dpm_resume_early+0x350/0x8f4 suspend_devices_and_enter+0xffc/0x168c pm_suspend+0xb54/0xdac File: drivers/base/power/main.c. static int device_resume(struct device *dev, pm_message_t state, bool async) if (!dpm_wait_for_superior(dev, async)) goto Complete; dpm_watchdog_set(&wd, dev); device_lock(dev); /* ... <SNIP> Unlock: device_unlock(dev); dpm_watchdog_clear(&wd); Case#C: dpm_watchdog cannot cover suspend hang in ksys_sync(). Call trace: __switch_to+0x1b0/0x1cc __schedule+0xac8/0x1314 io_schedule+0x94/0xc8 wait_on_page_bit+0x1f8/0x3a4 __filemap_fdatawait_range+0x134/0x150 sync_inodes_sb+0x368/0x584 sync_inodes_one_sb+0x18/0x24 iterate_supers+0x12c/0x2b8 ksys_sync+0x48/0x12c enter_state+0x294/0x7bc pm_suspend+0x164/0x2a8 - Explain some enhancement by following. Q1: Why not use the existing one? struct dpm_watchdog { struct device *dev; struct task_struct *tsk; struct timer_list timer; }; A1: In kernel/power/suspend.c, we don't have "struct device " because suspend.c is for core PM instead of device PM. So we propose to use sleep_timer struct. struct sleep_timer { struct task_struct *tsk; struct timer_list timer; }; Q2: Why not use stack memory for timer struct? static void dpm_watchdog_set(struct dpm_watchdog *wd, struct device *dev) { ... <SNIP> timer_setup_on_stack(timer, dpm_watchdog_handler, 0); <== dpm_watchdog use stack memory for timer struct. /* use same timeout value for both suspend and resume */ timer->expires = jiffies + HZ * CONFIG_DPM_WATCHDOG_TIMEOUT; add_timer(timer); } A2: dpm_watchdog use stack memory has limitation. It cannot support starting watchdog at end of function like s2idle_enter(). We cannot use stack memory for this case because the timer struct will be free when go back to caller. File: kernel/power/suspend.c static void s2idle_enter(void) { trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, true); + stop_sleep_timer(&st); + raw_spin_lock_irq(&s2idle_lock); if (pm_wakeup_pending()) goto out; ... <SNIP> s2idle_state = S2IDLE_STATE_NONE; raw_spin_unlock_irq(&s2idle_lock); + start_sleep_timer(&st); + trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, false); } So we propose to declare a global sleep timer struct in suspend.c like following. File: kernel/power/suspend.c static DEFINE_RAW_SPINLOCK(s2idle_lock); +DECLARE_SLEEP_TIMER(st); + /** * pm_suspend_default_s2idle - Check if suspend-to-idle is the default suspend. Using global timer has another advantage is we could support double calling add_timer() because add_timer() can easily to delete pending timer and add new timer by using the same timer struct. Q3: Why do you need to change the timeout handler? A3: dpm_watchdog_handler() need device struct to get device information, but suspend.c doesn't have it. In most of cases, we could know which device driver hang by tracing the suspend thread call stack logs without device struct information for debugging. So we propose to remove device struct and related information like "dev_driver_string(wd->dev)" and "dev_name(wd->dev)" from timeout handler. static void dpm_watchdog_handler(struct timer_list *t) { struct dpm_watchdog *wd = from_timer(wd, t, timer); dev_emerg(wd->dev, "**** DPM device timeout ****\n"); show_stack(wd->tsk, NULL, KERN_EMERG); panic("%s %s: unrecoverable failure\n", dev_driver_string(wd->dev), dev_name(wd->dev)); <== we don't need it. } We propose to dump all disk sleep tasks call stack log twice to double confirm suspend thread hang at the same function and make sure other sleep tasks were stuck at the same function for a long time. We also try to resume conole if the console is suspended. At the end of watchdog timeout handler, we propose to trigger kernel panic to prevent system hang like dpm_watchdog. static void sleep_timeout_handler(struct timer_list *t) { struct sleep_timer *st = from_timer(st, t, timer); static int timeout_count; pr_info("Sleep timeout (timer is %d seconds)\n", (CONFIG_PM_SLEEP_TIMER_TIMEOUT)); show_stack(st->tsk, NULL, KERN_EMERG); show_state_filter(TASK_UNINTERRUPTIBLE); if (timeout_count < 1) { timeout_count++; start_sleep_timer(st); return; } if (console_is_suspended()) resume_console(); panic("Sleep timeout and panic\n"); } Changes since v1: - Add commit message to explain why move dpm_watchdog to kernel/power/suspend.c. - Remove dpm_watchdog related functions in drivers/base/power/main.c. - Move suspend_timer.h from include/linux/ to kernel/power/. --- MAINTAINERS | 4 ++ drivers/base/power/main.c | 69 --------------------------- kernel/power/Kconfig | 27 +++++------ kernel/power/Makefile | 1 + kernel/power/suspend.c | 19 ++++++++ kernel/power/suspend_watchdog.c | 84 +++++++++++++++++++++++++++++++++ kernel/power/suspend_watchdog.h | 40 ++++++++++++++++ kernel/printk/printk.c | 2 +- 8 files changed, 162 insertions(+), 84 deletions(-) create mode 100644 kernel/power/suspend_watchdog.c create mode 100644 kernel/power/suspend_watchdog.hdiff --git a/MAINTAINERS b/MAINTAINERS index 867157311dc8..ecd988b4a838 100644 --- a/MAINTAINERS +++ b/MAINTAINERS@@ -7788,6 +7788,8 @@ F: include/linux/freezer.h F: include/linux/pm.h F: include/linux/suspend.h F: kernel/power/ +F: kernel/power/suspend_watchdog.c +F: kernel/power/suspend_watchdog.h HID CORE LAYER M: Jiri Kosina <jikos@kernel.org>@@ -16630,6 +16632,8 @@ F: include/linux/freezer.h F: include/linux/pm.h F: include/linux/suspend.h F: kernel/power/ +F: kernel/power/suspend_watchdog.c +F: kernel/power/suspend_watchdog.h
As you are now asking other people to maintain the code that you just added, you will need to get their signed-off-by on this patch before I can do anything with it. thanks, greg k-h