Re: [PATCH v2 00/11] Remove device private pages from physical address space
From: Jordan Niethe <hidden>
Date: 2026-01-14 05:41:57
Also in:
dri-devel, intel-xe, linux-mm, lkml
Hi, On 9/1/26 17:22, Matthew Brost wrote:
On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:quoted
Hi On 9/1/26 11:31, Matthew Brost wrote:quoted
On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:quoted
Hi, On 8/1/26 16:42, Jordan Niethe wrote:quoted
Hi, On 8/1/26 13:25, Jordan Niethe wrote:quoted
Hi, On 8/1/26 05:36, Matthew Brost wrote:quoted
Thanks for the series. For some reason Intel's CI couldn't apply this series to drm-tip to get results [1]. I'll manually apply this and run all our SVM tests and get back you on results + review the changes here. For future reference if you want to use our CI system, the series must apply to drm-tip, feel free to rebase this series and just send to intel-xe list if you want CIThanks, I'll rebase on drm-tip and send to the intel-xe list.For reference the rebase on drm-tip on the intel-xe list: https://patchwork.freedesktop.org/series/159738/ Will watch the CI results.The series causes some failures in the intel-xe tests: https://patchwork.freedesktop.org/series/159738/#rev4 Working through the failures now.Yea, I saw the failures. I haven't had time look at the patches on my end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may not have bandwidth to look in depth until mid next week but digging is on my TODO list.Sure, that's completely fine. The failures seem pretty directly related to the series so I think I'll be able to make good progress. For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html It looks like I missed that xe_pagemap_destroy_work() needs to be updated to remove the call to devm_release_mem_region() now we are no longer reserving a mem region.+1 So this is the one I’d be most concerned about [1]. xe_exec_system_allocator is our SVM test, which does almost all the ridiculous things possible in user space to stress SVM. It’s blowing up in the core MM—but the source of the bug could be anywhere (e.g., Xe SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I have bandwidth. Matt [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html
A similar fault in lruvec_stat_mod_folio can be repro'd if
memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of
(say)
numa_node_id() for the nid parameter.
The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
dev_to_node() for the nid parameter. Suspect this is causing something
similar
to happen.
When memremap_pages() calls pagemap_range() we have the following logic:
if (nid < 0)
nid = numa_mem_id();
I think we might need to add this to memremap_device_private_pagemap()
to handle
the NUMA_NO_NODE case. Still confirming.
Thanks,
Jordan.
quoted
Thanks, Jordan.quoted
Mattquoted
Thanks, Jordan.quoted
Thanks, Jordan.quoted
Jordan.quoted
I was also wondering if Nvidia could help review one our core MM patches [2] which is gating enabling 2M device pages too? Matt [1] https://patchwork.freedesktop.org/series/159738/ [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1