Re: [BUG RESEND] unsuspend failure under load
From: Miquel Raynal <miquel.raynal@bootlin.com>
Date: 2021-08-23 08:18:09
Hello, I think we should involve Richard as well in the discussion. Sean Nyekjaer [off-list ref] wrote on Tue, 6 Jul 2021 11:35:41 +0200:
On Mon, Jul 05, 2021 at 08:58:38AM +0200, Sean Nyekjaer wrote:quoted
Hi Miquel and Sascha,+ Richard + linux-pmquoted
We are having some trouble when our i.MX6 unsuspends while writes to ubifs is in progess. In the log it looks like it syncing the filesystem before suspend. The SoC a i.MX6ul/ull, the issue is (lucky for us) quite easy to reproduce. The reproduce script: [0] Kernel log when it happens: [1] I have bisected the bug to: ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op") Any idea to where I should start looking? or to what happens? Esben have posted to patches that relates to suspend/unsuspend but it doesn't seem to releated to this issue. 5bc6bb603b4d ("mtd: rawnand: gpmi: Fix suspend/resume problem") d70486668cdf ("mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume") /Seannand_resume() is called some time after ubi_io_write tries to write. Thats why mtd_write() is returning -EBUSY.
Just to be sure: - platform resumes - your app started a write before being suspended - the write gets refused because the suspended state has not been cleared yet Am I understanding this issue correctly? But I would expect "Filesystems sync" to actually let the lower layers the time to flush the data to the storage devices, suspending without waiting for this to happen looks strange to me. Sascha, Richard, what's your input?
quoted hunk ↗ jump to hunk
I have tried patch [3], and it seems to fix it. I think it would be okay to add the retry option, but the mdelay is not obviously a nogo. Any idea to how we could wait here for the nand_resume() to be called? @linux-pm: I have noticed "Filsystems sync" happens before "Freezing user space processes". If I apply patch [4] (without [3]), it would also fix our issue. But I don't have en insight in to what impact the change might have. /Seanquoted
[0] #!/bin/sh dd if=/dev/urandom of=/tmp/test50M bs=1M count=50 cp /tmp/test50M /data/ & echo mem > /sys/power/state [1] root@iwg26-v2:/data/root# ./ubicrash.sh 50+0 records in 50+0 records out PM: suspend entry (deep) Filesystems sync: 33.642 seconds Freezing user space processes ... (elapsed 0.004 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done. printk: Suspending console(s) (use no_console_suspend to debug) <SUSPEND/WAKE> PM: suspend devices took 0.040 seconds Disabling non-boot CPUs ... ubi0 error: ubi_io_write: error -16 while writing 4096 bytes to PEB 544:53248, written 0 bytes CPU: 0 PID: 69 Comm: kworker/u2:2 Not tainted 5.13.0 #3 Hardware name: Freescale i.MX6 Ultralite (Device Tree) Workqueue: writeback wb_workfn (flush-ubifs_0_8) [<c010d9b0>] (unwind_backtrace) from [<c010a28c>] (show_stack+0x10/0x14) [<c010a28c>] (show_stack) from [<c0970798>] (dump_stack+0xc0/0xdc) [<c0970798>] (dump_stack) from [<c05dfe10>] (ubi_io_write+0x510/0x6b0) [<c05dfe10>] (ubi_io_write) from [<c05dcd90>] (ubi_eba_write_leb+0x388/0x910) [<c05dcd90>] (ubi_eba_write_leb) from [<c05daf34>] (ubi_leb_write+0xd0/0xe8) [<c05daf34>] (ubi_leb_write) from [<c03cfeb4>] (ubifs_leb_write+0x68/0x104)[ ... ]quoted
UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30 UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 10962 of inode 821, error -30 UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30 UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 10963 of inode 821, error -30 UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 696 bytes in jhead 2, error -30 UBIFS error (ubi0:8 pid 157): do_writepage: cannot write page 0 of inode 819, error -30 UBIFS error (ubi0:8 pid 157): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30[3]:diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c index 14d890b00d2c..b24c571fa022 100644 --- a/drivers/mtd/ubi/io.c +++ b/drivers/mtd/ubi/io.c@@ -268,8 +269,18 @@ int ubi_io_write(struct ubi_device *ubi, const void *buf, int pnum, int offset, } addr = (loff_t)pnum * ubi->peb_size + offset; +retry: err = mtd_write(ubi->mtd, addr, len, &written, buf); if (err) { + if (retries++ < UBI_IO_RETRIES) { + ubi_warn(ubi, "error %d while writing %d bytes to PEB %d:%d, written %zd bytes", + err, len, pnum, offset, written); + mdelay(25); yield(); + goto retry; + } + ubi_err(ubi, "error %d while writing %d bytes to PEB %d:%d, written %zd bytes", err, len, pnum, offset, written); dump_stack();[4]:diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 32391acc806b..61a213ad5a13 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c@@ -563,18 +563,18 @@ static int enter_state(suspend_state_t state) if (state == PM_SUSPEND_TO_IDLE) s2idle_begin(); - if (sync_on_suspend_enabled) { - trace_suspend_resume(TPS("sync_filesystems"), 0, true); - ksys_sync_helper(); - trace_suspend_resume(TPS("sync_filesystems"), 0, false); - } - pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]); pm_suspend_clear_flags(); error = suspend_prepare(state); if (error) goto Unlock; + if (sync_on_suspend_enabled) { + trace_suspend_resume(TPS("sync_filesystems"), 0, true); + ksys_sync_helper(); + trace_suspend_resume(TPS("sync_filesystems"), 0, false); + } + if (suspend_test(TEST_FREEZER)) goto Finish;
Thanks, Miquèl