Re: BAR resizing broken in 6.18 (PPC only?)
From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: 2025-10-23 17:43:11
Also in:
linux-pci
On Wed, 22 Oct 2025, Simon Richter wrote:
On 10/22/25 1:20 AM, Ilpo Järvinen wrote:quoted
Could you please test if the patch below helps.Yes, this looks better. - "good" is the 6.17 reference - "shrink" is with this patch and the BAR0 release from Lucas - "bar0" is with this patch, with the bridge BAR0 still mapped (i.e. without the patch from Lucas) If you compare "good" vs "bar0", the differences are now fairly minimal. The non-prefetchable window has shrunk, but assignments are otherwise the same.
If a window has extra size prior to any resource fitting operation, the kernel will recalculate the size based on what it knows about the downstream resource sizes, no more so extra size is removed. I thought that old_size was to prevent such shrinkage, but it is problematic as we've seen here (and also in a some other cases). It would be possible to move the max for old_size outside of align so something like this instead of the patch you tested: - return ALIGN(max(size, old_size), align); + return max(ALIGN(size, align), old_size); That would not try to make the bridge window larger due to alignment than what the old_size was, so it should still fit to its old range keeping its old size.
I've added "lspci -v" output as well, which shows the bridge configuration. I'm still not sure that the address mappings between PCI and system bus are 1:1. So the BAR0 release patch from Lucas seems to be no longer required with this, although it does align the prefetchable area better, so in theory it would allow a 512G BAR to be mapped. In practice, there are no Intel dGPUs with 512G VRAM.quoted
There's indeed something messy and odd going on here with the resource and window mappings, in the bad case there's also this line which doesn't make much sense: +pci 0030:01:00.0: bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref]: can't claim; address conflict with 0030:01:00.0 [mem 0x6200020000000-0x62000207fffff 64bit pref]quoted
...but that conflicting resource was not assigned in between releasing this bridge window and trying to claim it back so how did that conflicting resource get there is totally mysterious to me. It doesn't seem related directly to the the resize no longer working though.That is the upstream bridge's BAR0 mapping, which is not a bridge window, so presumably the window allocation algorithm is unaware of it.
Resource tree is independent of PCI's resource allocation algorithm. Now
that I look the numbers and logs again, this doesn't look valid resource
tree state (from iomem.good!):
6200000000000-6203fbfffffff : pciex@620c3c0000000
6200000000000-6203fbff0ffff : PCI Bus 0030:01
6200020000000-62000207fffff : 0030:01:00.0
6200000000000-6203fbff0ffff : PCI Bus 0030:02
6200400000000-62007ffffffff : PCI Bus 0030:03
6200400000000-62007ffffffff : 0030:03:00.0
6200020000000-62000207fffff and 6200000000000-6203fbff0ffff appear as
siblings and those addresses conflict. It seems this "good" kernel is
"cheating" by double counting addresses... ;-D
I've now found the cause in part thanks to another reporter with
similar impossible resource conflicts (an old bug in the resizing
algorithm which is there since BAR resizing was introduced).
It will take me a few days to fix all this as fixing the claim issue
will make other domino bricks to fall so I'll have to refactor this
pci_resize_resource() interface now, unfortunately.
quoted
quoted
It's a bit weird that there is a log message that says "enabling device", then the BARs are reconfigured. I'd want the decoding logic to be inactive while addresses are assigned.quoted
So no real issue here and only logging is not the way you'd want it?It works for the GPU, but I'm unsure about my FPGA designs now, for the most part, I would have expected that the "enable memory decoding" bit had to be 0 while BAR registers are being written, and I would have expected the driver to resize the BAR first, then enable the device.
Lucas did move resizing earlier but I guess it still occurs after enabling the device. I don't know enough about xe driver to say how early BARs could be resized. -- i.