Thread (2 messages) 2 messages, 2 authors, 2022-03-31

RE: [PATCH] fbdev: defio: fix the pagelist corruption

From: Liu, Chuansheng <hidden>
Date: 2022-03-31 08:40:04
Also in: bpf, dri-devel, linux-fbdev, linux-mm

Hi Paul,
-----Original Message-----
From: dri-devel <redacted> On Behalf Of Paul
Menzel
Sent: Thursday, March 31, 2022 4:22 PM
To: Liu, Chuansheng <redacted>
Cc: linux-fbdev@vger.kernel.org; Dave Hansen <dave.hansen@linux.intel.com>;
akpm@linux-foundation.org; daniel@iogearbox.net; linux-mm@kvack.org;
netdev@vger.kernel.org; deller@gmx.de; x86@kernel.org; ast@kernel.org; dri-
devel@lists.freedesktop.org; andrii@kernel.org; Song Liu [off-list ref];
Ingo Molnar [off-list ref]; Thomas Gleixner [off-list ref];
tzimmermann@suse.de; Borislav Petkov [off-list ref]; bpf@vger.kernel.org;
Edgecombe, Rick P [off-list ref]; kernel-team@fb.com
Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption

Dear Chuansheng,


Am 31.03.22 um 02:06 schrieb Liu, Chuansheng:
quoted
quoted
-----Original Message-----
From: Paul Menzel <redacted>
Sent: Thursday, March 31, 2022 12:47 AM
[…]
quoted
quoted
Am 29.03.22 um 01:58 schrieb Liu, Chuansheng:
quoted
quoted
-----Original Message-----
From: Paul Menzel
Sent: Monday, March 28, 2022 2:15 PM
quoted
quoted
Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:
quoted
quoted
-----Original Message-----
quoted
quoted
Sent: Saturday, March 26, 2022 4:11 PM
quoted
quoted
Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
quoted
Easily hit the below list corruption:
==
list_add corruption. prev->next should be next (ffffffffc0ceb090), but
was ffffec604507edc8. (prev=ffffec604507edc8).
WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
__list_add_valid+0x53/0x80
CPU: 65 PID: 3959 Comm: fbdev Tainted: G     U
RIP: 0010:__list_add_valid+0x53/0x80
Call Trace:
     <TASK>
     fb_deferred_io_mkwrite+0xea/0x150
     do_page_mkwrite+0x57/0xc0
     do_wp_page+0x278/0x2f0
     __handle_mm_fault+0xdc2/0x1590
     handle_mm_fault+0xdd/0x2c0
     do_user_addr_fault+0x1d3/0x650
     exc_page_fault+0x77/0x180
     ? asm_exc_page_fault+0x8/0x30
     asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fd98fc8fad1
==

Figure out the race happens when one process is adding &page->lru
into
quoted
quoted
quoted
quoted
quoted
quoted
quoted
the pagelist tail in fb_deferred_io_mkwrite(), another process is
re-initializing the same &page->lru in fb_deferred_io_fault(), which is
not protected by the lock.

This fix is to init all the page lists one time during initialization,
it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
redundantly.

Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already
enlisted")
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Chuansheng Liu <redacted>
---
     drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
     1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fb_defio.c
b/drivers/video/fbdev/core/fb_defio.c
quoted
quoted
quoted
quoted
quoted
quoted
quoted
index 98b0f23bf5e2..eafb66ca4f28 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct
vm_fault *vmf)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
     		printk(KERN_ERR "no mapping available\n");

     	BUG_ON(!page->mapping);
-	INIT_LIST_HEAD(&page->lru);
     	page->index = vmf->pgoff;

     	vmf->page = page;
@@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct
work_struct *work)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
     void fb_deferred_io_init(struct fb_info *info)
     {
     	struct fb_deferred_io *fbdefio = info->fbdefio;
+	struct page *page;
+	int i;

     	BUG_ON(!fbdefio);
     	mutex_init(&fbdefio->lock);
@@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
     	INIT_LIST_HEAD(&fbdefio->pagelist);
     	if (fbdefio->delay == 0) /* set a default of 1 s */
     		fbdefio->delay = HZ;
+
+	/* initialize all the page lists one time */
+	for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
+		page = fb_deferred_io_page(info, i);
+		INIT_LIST_HEAD(&page->lru);
+	}
     }
     EXPORT_SYMBOL_GPL(fb_deferred_io_init);
Applying your patch on top of current Linus’ master branch, tty0 is
unusable and looks frozen. Sometimes network card still works,
sometimes
quoted
quoted
quoted
quoted
quoted
quoted
not.
I don't see how the patch would cause below BUG call stack, need some
time to
quoted
quoted
quoted
quoted
quoted
debug. Just few comments:
1. Will the system work well without this patch?
Yes, the framebuffer works well without the patch.
quoted
2. When you are sure the patch causes the regression you saw, please get
free
quoted
quoted
quoted
quoted
to submit one reverted patch, thanks : )

I think you for patch wasn’t submitted yet – at least not pulled by Linus.
The patch has been in drm-tip, could you have a try with the latest drm-tip
to see
quoted
quoted
quoted
if the Framebuffer works well, in that case, we could revert it in drm-tip then.
With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration
manifest) everything works fine. (I had to disable amdgpu driver, as it
failed to build.) Is anyone able to explain that?
My patch is for fixing another patch which is in the drm-tip at least,
The referenced commit 105a940416fc in the Fixes tag is also in Linus’
master branch.
quoted
so I assume applying my patch into Linus tree directly is not
completely proper. That's my intention of asking your help for
retesting drm-tip.
If there were such a relation, that would need to be documented in the
commit message.
You should have seen it : )
quoted
You mean everything working fine means another issue you hit is also
gone?
No, I just mean the hang when applying your patch.

Anyway, after figuring out, that drm-tip, is actually not behind Linus’
master branch, I tried to figure out the differences, and it turns out
it’s also related to commit fac54e2bfb5b (x86/Kconfig: Select
HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP) [1], which is in
Linus’
master branch, but not drm-tip. Note, I am using a 32-bit user space and
a 64-bit Linux kernel. Reverting commit fac54e2bfb5b, and having your
patch a applied, the hang is gone.
Good to know you have figured it out, and the issue you hit is not related to
my patch : )
I am adding the people involved in the other discussion to make them
aware of this failure case.


Kind regards,

Paul


[1]: https://linux-regtracking.leemhuis.info/regzbot/mainline/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help