Thread (12 messages) 12 messages, 5 authors, 2022-03-31

Re: [PATCH] fbdev: defio: fix the pagelist corruption

From: Paul Menzel <hidden>
Date: 2022-03-26 08:11:15
Also in: dri-devel

Dear Chuansheng,


Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
quoted hunk ↗ jump to hunk
Easily hit the below list corruption:
==
list_add corruption. prev->next should be next (ffffffffc0ceb090), but
was ffffec604507edc8. (prev=ffffec604507edc8).
WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
__list_add_valid+0x53/0x80
CPU: 65 PID: 3959 Comm: fbdev Tainted: G     U
RIP: 0010:__list_add_valid+0x53/0x80
Call Trace:
  <TASK>
  fb_deferred_io_mkwrite+0xea/0x150
  do_page_mkwrite+0x57/0xc0
  do_wp_page+0x278/0x2f0
  __handle_mm_fault+0xdc2/0x1590
  handle_mm_fault+0xdd/0x2c0
  do_user_addr_fault+0x1d3/0x650
  exc_page_fault+0x77/0x180
  ? asm_exc_page_fault+0x8/0x30
  asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fd98fc8fad1
==

Figure out the race happens when one process is adding &page->lru into
the pagelist tail in fb_deferred_io_mkwrite(), another process is
re-initializing the same &page->lru in fb_deferred_io_fault(), which is
not protected by the lock.

This fix is to init all the page lists one time during initialization,
it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
redundantly.

Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already
enlisted")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Chuansheng Liu <redacted>
---
  drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 98b0f23bf5e2..eafb66ca4f28 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
  		printk(KERN_ERR "no mapping available\n");
  
  	BUG_ON(!page->mapping);
-	INIT_LIST_HEAD(&page->lru);
  	page->index = vmf->pgoff;
  
  	vmf->page = page;
@@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct *work)
  void fb_deferred_io_init(struct fb_info *info)
  {
  	struct fb_deferred_io *fbdefio = info->fbdefio;
+	struct page *page;
+	int i;
  
  	BUG_ON(!fbdefio);
  	mutex_init(&fbdefio->lock);
@@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
  	INIT_LIST_HEAD(&fbdefio->pagelist);
  	if (fbdefio->delay == 0) /* set a default of 1 s */
  		fbdefio->delay = HZ;
+
+	/* initialize all the page lists one time */
+	for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
+		page = fb_deferred_io_page(info, i);
+		INIT_LIST_HEAD(&page->lru);
+	}
  }
  EXPORT_SYMBOL_GPL(fb_deferred_io_init);
  
Applying your patch on top of current Linus’ master branch, tty0 is 
unusable and looks frozen. Sometimes network card still works, sometimes 
not.

     $ git log --oneline -nodecorate -2
     1b351a77ed33 (HEAD -> linus) fbdev: defio: fix the pagelist corruption
     52d543b5497c (origin/master, origin/HEAD) Merge tag 
'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi
[    5.256996] raw: 0000000000000000 0000000000000000 00000000ffffffff 
0000000000000000
[    5.269582] page dumped because: VM_BUG_ON_PAGE(compound && 
compound_order(page) != order)
[    5.279507] ------------[ cut here ]------------
[    5.286406] kernel BUG at mm/page_alloc.c:1326!
[    5.291814] invalid opcode: 0000 [#1] PREEMPT SMP
[    5.296350] CPU: 0 PID: 167 Comm: systemd-udevd Not tainted 
5.17.0-10753-g1b351a77ed33 #300
[    5.304670] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 
4.16-337-gb87986e67b 03/25/2022
[    5.313163] RIP: 0010:free_pcp_prepare+0x295/0x400
[    5.317930] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 
8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd 
ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
[    5.336650] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
[    5.341849] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 
0000000000000000
[    5.348957] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 
00000000ffffffff
[    5.356063] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 
00000000ffffdfff
[    5.363170] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 
0000000000000000
[    5.370277] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: 
ffffe4be840c0000
[    5.377384] FS:  0000000000000000(0000) GS:ffff91fd7b400000(0063) 
knlGS:00000000f7eea800
[    5.385443] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[    5.391164] CR2: 00000000f6f0e840 CR3: 0000000106b60000 CR4: 
00000000000406f0
[    5.398272] Call Trace:
[    5.400697]  <TASK>
[    5.402778]  free_unref_page+0x1b/0xf0
[    5.406505]  __vunmap+0x216/0x2c0
[    5.409798]  drm_fbdev_cleanup+0x5f/0xb0
[    5.413698]  drm_fbdev_fb_destroy+0x15/0x30
[    5.417857]  unregister_framebuffer+0x2c/0x40
[    5.422191]  drm_client_dev_unregister+0x69/0xe0
[    5.422962] usb usb4: New USB device found, idVendor=1d6b, 
idProduct=0003, bcdDevice= 5.17
[    5.426784]  drm_dev_unregister+0x2e/0x80
[    5.439005]  drm_dev_unplug+0x21/0x40
[    5.442645]  simpledrm_remove+0x11/0x20
[    5.446458]  platform_remove+0x1f/0x40
[    5.450185]  __device_release_driver+0x17a/0x250
[    5.454779]  device_release_driver+0x24/0x30
[    5.459024]  bus_remove_device+0xd8/0x140
[    5.463012]  device_del+0x18b/0x3f0
[    5.466478]  ? idr_alloc_cyclic+0x50/0xb0
[    5.470466]  platform_device_del.part.0+0x13/0x70
[    5.475146]  platform_device_unregister+0x1c/0x30
[    5.479824]  drm_aperture_detach_drivers+0xa1/0xd0
[    5.484593]  drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
[    5.491179]  radeon_pci_probe+0x54/0xf0 [radeon]
[    5.495773]  local_pci_probe+0x45/0x80
[    5.499499]  ? pci_match_device+0xd7/0x130
[    5.503572]  pci_device_probe+0xc2/0x1e0
[    5.507474]  really_probe+0x1f5/0x3d0
[    5.511112]  __driver_probe_device+0xfe/0x180
[    5.515446]  driver_probe_device+0x1e/0x90
[    5.519518]  __driver_attach+0xc0/0x1c0
[    5.523332]  ? __device_attach_driver+0xe0/0xe0
[    5.527839]  ? __device_attach_driver+0xe0/0xe0
[    5.532346]  bus_for_each_dev+0x78/0xc0
[    5.536159]  bus_add_driver+0x149/0x1e0
[    5.539973]  driver_register+0x8f/0xe0
[    5.543699]  ? 0xffffffffc0741000
[    5.546992]  do_one_initcall+0x44/0x200
[    5.550806]  ? kmem_cache_alloc_trace+0x170/0x2c0
[    5.555487]  do_init_module+0x4c/0x240
[    5.559213]  __do_sys_finit_module+0xb4/0x120
[    5.563547]  __do_fast_syscall_32+0x6b/0xe0
[    5.567706]  do_fast_syscall_32+0x2f/0x70
[    5.571693]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[    5.577067] RIP: 0023:0xf7efa549
[    5.580273] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 
07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd 
80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
[    5.582805] usb usb4: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[    5.598992] RSP: 002b:00000000ff831c0c EFLAGS: 00200296 ORIG_RAX: 
000000000000015e
[    5.598996] RAX: ffffffffffffffda RBX: 0000000000000011 RCX: 
00000000f7ed9e09
[    5.598998] RDX: 0000000000000000 RSI: 0000000056a5c940 RDI: 
0000000056a5c4c0
[    5.598999] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[    5.635047] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[    5.642154] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[    5.649264]  </TASK>
[    5.651427] Modules linked in: crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon(+) r8169 
xhci_pci(+) realtek snd_hda_intel drm_ttm_helper snd_intel_dspcfg 
k10temp snd_hda_codec ttm snd_hda_core xhci_hcd snd_pcm sg ohci_hcd 
ehci_pci(+) snd_timer drm_dp_helper snd ehci_hcd soundcore i2c_piix4 
acpi_cpufreq coreboot_table fuse ipv6 autofs4
[    5.690975] r8169 0000:04:00.0 enp4s0: renamed from eth0
[    5.691589] ---[ end trace 0000000000000000 ]---
[    5.704791] RIP: 0010:free_pcp_prepare+0x295/0x400
[    5.709784] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 
8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd 
ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
[    5.731535] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
[    5.752988] usb usb4: Product: xHCI Host Controller
[    5.758571] usb usb4: Manufacturer: Linux 5.17.0-10753-g1b351a77ed33 
xhci-hcd
[    5.767096] usb usb4: SerialNumber: 0000:03:00.0
[    5.772213] hub 4-0:1.0: USB hub found
[    5.782383] hub 4-0:1.0: 2 ports detected
[    5.799251] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 
0000000000000000
[    5.810470] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 
00000000ffffffff
[    5.817561] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 
00000000ffffdfff
[    5.824680] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 
0000000000000000
[    5.831739] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: 
ffffe4be840c0000
[    5.839445] FS:  0000000000000000(0000) GS:ffff91fd7b500000(0063) 
knlGS:00000000f7eea800
[    5.847905] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[    5.854025] CR2: 000000005664d26c CR3: 0000000106b60000 CR4: 
00000000000406e0

Kind regards,

Paul


PS: For some reason, the lore.kernel.org lists most messages twice [1].

PPS: I am actually wanted to analyze the new regression, and thought 
your patch might help, but made it worse. ;-) (The log excerpt is from 
Linux master.)
[    1.738965] BUG: Bad page state in process systemd-udevd  pfn:103003
[    1.738974] fbcon: Taking over console
[    1.740459] page:00000000c3b5c591 refcount:0 mapcount:0 mapping:0000000
000000000 index:0x3 pfn:0x103003
[    1.740466] head:000000009b49a8e9 order:9 compound_mapcount:0 compound_
pincount:0
[    1.740468] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3ff
f)
[    1.740473] raw: 002fffc000000000 fffff139840c0001 fffff139840c00c8 000
0000000000000
[    1.740475] raw: 0000000000000000 0000000000000000 00000000ffffffff 000
0000000000000
[    1.740477] head: 002fffc000010000 0000000000000000 dead000000000122 00
00000000000000
[    1.740479] head: 0000000000000000 0000000000000000 00000000ffffffff 00
00000000000000
[    1.740480] page dumped because: corrupted mapping in tail page
I am going to do that in another thread.

[1]: 
https://lore.kernel.org/all/20220317054602.28846-1-chuansheng.liu@intel.com/ (local)

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help