[RFC 2/4] PCI: generic: Add support for ARM64 and MSI(x)
From: Lorenzo Pieralisi <hidden>
Date: 2014-10-07 14:47:57
Also in:
linux-devicetree, linux-pci, lkml
On Tue, Oct 07, 2014 at 02:52:27PM +0100, Arnd Bergmann wrote:
On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote:quoted
On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: [...]quoted
pci_mmap_page_range could either get generalized some more in an attempt to have a __weak default implementation that works on ARM, or it could be changed to lose the dependency on pci_sys_data instead. In either case, the change would involve using the generic pci_host_bridge_window list.On ARM pci_mmap_page_range requires pci_sys_data to retrieve its mem_offset parameter. I had a look, and I do not understand *why* it is required in that function, so I am asking. That function is basically used to map PCI resources to userspace, IIUC, through /proc or /sysfs file mappings. As far as I understand those mappings expect VMA pgoff to be the CPU address when files representing resources are mmapped from /proc and 0 when mmapped from /sys (I mean from userspace, then VMA pgoff should be updated by the kernel to map the resource).Applying the mem_offset is certainly the more intuitive way, since that lets you read the PCI BAR values from a device and access the device with the appropriate offsets.
Ok, but I am referring to this snippet (drivers/pci/pci-sysfs.c):
/* pci_mmap_page_range() expects the same kind of entry as coming
* from /proc/bus/pci/ which is a "user visible" value. If this is
* different from the resource itself, arch will do necessary fixup.
*/
pci_resource_to_user(pdev, i, res, &start, &end);
--> Here start represents a CPU physical address, if pci_resource_to_user()
does not fix it up, correct ?
vma->vm_pgoff += start >> PAGE_SHIFT;
[...]
return pci_mmap_page_range(...);
pci_mmap_page_range() applies (mem_offset >> PAGE_SHIFT) to pgoff in the
ARM implemention.
Is not there a mismatch here on platforms where mem_offset != 0 ?
quoted
Question is: why pci_mmap_page_range() should apply an additional shift to the VMA pgoff based on pci_sys_data.mem_offset, which represents the offset from cpu->bus offset. I do not understand that. PowerPC does not seem to apply that fix-up (in PowerPC __pci_mmap_make_offset there is commented out code which prevents the pci_mem_offset shift to be applied). I think it all boils down to what the userspace interface is expecting when the memory areas are mmapped, if anyone has comments on this that is appreciated.The important part is certainly that whatever transformation is done by pci_resource_to_user() gets undone by __pci_mmap_make_offset().
Exactly, it does not seem to be the case above, that's why I asked.
In case of PowerPC and Microblaze, the mem_offset handling is commented
out in both, to work around X11 trying to use the same values on
/dev/mem. However, they do have the respective fixup for io_offset.
sparc applies the offset in both places for both io_offset and mem_offset.
xtensa applies only io_offset in __pci_mmap_make_offset but neither
in pci_resource_to_user. This probably works because the mem_offset is
always zero there.
mips applies a different fixup (for 36-bit addressing), but not the
mem_offset.
Every other architecture applies no offset here, neither in __pci_mmap_make_offset/pci_mmap_page_range nor in pci_resource_to_user
The only hint I could find for how the ARM version came to be is
from the historic kernel tree git log for linux-2.5.42, which added
the current code as
2002/10/13 11:05:47+01:00 rmk
[ARM] Update pcibios_enable_device, supply pci_mmap_page_range()
Update pcibios_enable_device to only enable requested resources,
mainly for IDE. Supply a pci_mmap_page_range() function to allow
user space to mmap PCI regions.
At that point, only two platforms had a nonzero mem_offset:
footbridge/dc21285 and integrator/pci_v3. Both were using VGA,
and presumably used this to make X work. (rmk might remember
details).I think that, as I mentioned, it boils down to what the userspace interface (proc/sys and they seem to differ) is supposed to be passed from userspace processes upon mmap.
The code at the time matched what powerpc and sparc did, but then both implemented pci_resource_to_user() in order for libpciaccess to work correctly (bcea1db16b for sparc, 463ce0e103f for powerpc), and later powerpc changed it again to not apply the offset in pci_resource_to_user or pci_mmap_page_range in 396a1a5832ae.
I will keep investigating, thank you for your help, any further comments appreciated. Lorenzo