RE: fbdev deferred I/O broken in some scenarios
From: Michael Kelley <hidden>
Date: 2025-03-19 20:38:03
Also in:
linux-fbdev, lkml
From: Thomas Zimmermann <tzimmermann@suse.de> Sent: Tuesday, March 18, 2025 1:26 AM
Am 18.03.25 um 03:05 schrieb Michael Kelley:quoted
I've been trying to get mmap() working with the hyperv_fb.c fbdev driver, which is for Linux guests running on Microsoft's Hyper-V hypervisor. The hyperv_fb driver uses fbdev deferred I/O for performance reasons. But it looks to me like fbdev deferred I/O is fundamentally broken when the underlying framebuffer memory is allocated from kernel memory (alloc_pages or dma_alloc_coherent). The hyperv_fb.c driver may allocate the framebuffer memory in several ways, depending on the size of the framebuffer specified by the Hyper-V host and the VM "Generation". For a Generation 2 VM, the framebuffer memory is allocated by the Hyper-V host and is assigned to guest MMIO space. The hyperv_fb driver does a vmalloc() allocation for deferred I/O to work against. This combination handles mmap() of /dev/fb<n> correctly and the performance benefits of deferred I/O are substantial. But for a Generation 1 VM, the hyperv_fb driver allocates the framebuffer memory in contiguous guest physical memory using alloc_pages() or dma_alloc_coherent(), and informs the Hyper-V host of the location. In this case, mmap() with deferred I/O does not work. The mmap() succeeds, and user space updates to the mmap'ed memory are correctly reflected to the framebuffer. But when the user space program does munmap() or terminates, the Linux kernel free lists become scrambled and the kernel eventually panics. The problem is that when munmap() is done, the PTEs in the VMA are cleaned up, and the corresponding struct page refcounts are decremented. If the refcount goes to zero (which it typically will), the page is immediately freed. In this way, some or all of the framebuffer memory gets erroneously freed. From what I see, the VMA should be marked VM_PFNMAP when allocated memory kernel is being used as the framebuffer with deferred I/O, but that's not happening. The handling of deferred I/O page faults would also need updating to make this work.I cannot help much with HyperV, but there's a get_page callback in struct fb_deferred_io. [1] It'll allow you to provide a custom page on each page fault. We use it in DRM to mmap SHMEM-backed pages. [2] Maybe this helps with hyperv_fb as well.
Thanks for your input. See also my reply to Helge. Unfortunately, using a custom get_page() callback doesn't help. In the problematic case, the standard deferred I/O get_page() function works correctly for getting the struct page. My current thinking is that the problem is in fb_deferred_io_mmap() where the vma needs to have the VM_PFNMAP flag set when the framebuffer memory is a direct kernel allocation and not through vmalloc(). And there may be some implications on the mkwrite function as well, but I'll need to sort that out once I start coding. For the DRM code using SHMEM-backed pages, do you know where the shared memory comes from? Is that ultimately a kernel vmalloc() allocation? Michael