Thread (27 messages) 27 messages, 5 authors, 2016-03-15

[RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support for HiSilicon SoCs Host Controllers

From: Gabriele Paoloni <hidden>
Date: 2016-03-03 14:21:48
Also in: linux-acpi, linux-pci, lkml

Hi Lorenzo, many thanks for replying
-----Original Message-----
From: Lorenzo Pieralisi [mailto:lorenzo.pieralisi at arm.com]
Sent: 29 February 2016 11:38
To: Gabriele Paoloni
Cc: Bjorn Helgaas; 'Mark Rutland'; Guohanjun (Hanjun Guo); Wangzhou
(B); liudongdong (C); Linuxarm; qiujiang; 'bhelgaas at google.com';
'arnd at arndb.de'; 'tn at semihalf.com'; 'linux-pci at vger.kernel.org';
'linux-kernel at vger.kernel.org'; xuwei (O); 'linux-
acpi at vger.kernel.org'; 'jcm at redhat.com'; zhangjukuo; Liguozhu
(Kenneth); 'linux-arm-kernel at lists.infradead.org'
Subject: Re: [RFC PATCH v3 3/3] PCI/ACPI: hisi: Add ACPI support for
HiSilicon SoCs Host Controllers
quoted
quoted
quoted
quoted
quoted
I think the relevant spec is the PCI Firmware Spec, r3.0, sec
4.1.2.
quoted
quoted
quoted
Note 2 in that section says the address range of an MMCFG
region
quoted
quoted
quoted
quoted
quoted
must be reserved by declaring a motherboard resource, i.e.,
included
quoted
quoted
quoted
in the _CRS of a PNP0C02 or similar device.
I had a look a this. So yes the specs says that we should use
the
quoted
quoted
quoted
quoted
PNP0C02 device if MCFG is not supported.
AFAIK, PNP0C02 is a resource reservation mechanism, the spec says
that
quoted
MCFG regions must be reserved using PNP0C02, even though its
current usage on x86 is a bit unfathomable to me, in particular
in relation to MCFG resources retrieved for hotpluggable bridges
(ie
quoted
quoted
quoted
through _CBA, which I think consider the MCFG region as reserved
by default, regardless of PNP0c02):

see pci_mmcfg_check_reserved() arch/x86/pci/mmconfig-shared.c
Yes I checked this and it seems to check if an area of memory from
MCFG is overlapping with any area of memory specified by PNP0C02
_CRS...

However (maybe I am wrong) it looks to me that this part works
independently of the PNP0c02 driver. It seems that goes directly
to walk the ACPI namespace and look for PNP0C02 HID; as it finds it,
it checks the range of memory specified in the _CRS method and see
if it overlaps with the MCFG resource...am I missing something?
Well, if I understand the code correctly, the x86 MCFG code, for static
MCFG tables, check that the MCFG regions are part of motherboard
resources (by walking the ACPI namespace as you said). If that's the
case,
it does not insert resources into the resource tree till a
late_initcall,
(pci_mmcfg_late_insert_resources()) that should run after the PNP0C02
driver
was initialized (?) (I guess, that's a nightmare to understand these
initcall
ordering unwritten rules dependencies, if they exist). If the MCFG
region is
not part of motherboard resources it is discarded (ie
pci_mmconfig_insert()
in arch/x86/pci/mmconfig-shared.c)
Yes it looks like at boot time it parses the ACPI namespace, if a
cfg region is marked as reserved, then it is mapped (cfg->virt is
assigned) and is added to pci_mmcfg_list so that later on this list
can be used by pci_mmcfg_late_insert_resources() to insert the cfg->res
in the iomem_resource resource tree.

However there are some things that are not clear to me:
-  if an MCFG region is not marked as reserved, is it just discarded by
   the x86 framework?
   I see that in the Nowicki generic "pci_mmconfig_map_resource" we just
   map the mcfg config regions with no need for them to be reserved...?
-  I do not understand the role of the PNP system driver. It looks like
   this also inserts resources under iomem_resource resource tree (in
   Case of memory resources)...

Thanks

Gab
I really do not know why x86 code has been implemented like that
and whatever I say it is just speculation, honestly it is not easy
to understand.
quoted
If my interpretation is correct, couldn't we just modify
pci_mmconfig_map_resource() in the latest Nowicki patchset and add
a similar check before insert_resource_conflict() is called?
Yes, but I am not sure that would help much. We can prevent adding
MCFG resources to the resource tree if they are not declared as
motherboard
resources, but if they do it is totally unclear to me what we should
do.

Should we insert the MCFG region resources into the resource tree
before
PNP0C02 driver has a chance to be probed ? Or do what x86 does (ie it
does
not insert resources into the resource tree till late_initcall
pci_mmcfg_late_insert_resources()) ? I do not know. If the PNP0C02
driver stays as it is, whatever we do with PNP0C02 regions does not
make
much sense to me, that's the gist of what I am saying, I don't know why
x86 code was implemented like that, I will go through the commits
history
to find some light here.

Lorenzo
quoted
On the other side HiSilicon host bridge quirks could use the address
retrieved by the _CRS method of PNP0C02 for our root complex config
rd/wr...?
quoted
I don't know how _CBA-related resources would be reserved.  I
haven't
quoted
quoted
personally worked with any host bridges that supply _CBA, so I
don't
quoted
quoted
know whether or how they handled it.

I think the spec intent was that the Consumer/Producer bit
(Extended
quoted
quoted
Address Space Descriptor, General Flags Bit[0], see ACPI spec sec
6.4.3.5.4) would be used.  Resources such as ECAM areas would be
marked "Consumer", meaning the bridge consumed that space itself,
and
quoted
quoted
windows passed down to the PCI bus would be marked "Producer".

But BIOSes didn't use that bit consistently, so we couldn't rely on
it.  I verified experimentally that Windows didn't pay attention to
that bit either, at least for DWord descriptors:
https://bugzilla.kernel.org/show_bug.cgi?id=15701

It's conceivable that we could still use that bit in Extended
Address
quoted
quoted
Space descriptors, or maybe some hack like pay attention if the
bridge
quoted
quoted
has _CBA, or some such.  Or maybe a BIOS could add a PNP0C02 device
along with the PNP0A03 device that uses _CBA, with the PNP0C02 _CRS
describing the ECAM area referenced by _CBA.  Seeems hacky no
matter
quoted
quoted
how we slice it.
Well about this I don't know much but, having looked at the bugzilla
and considering the current mechanism used by
pci_mmcfg_check_reserved()
quoted
I have the feeling that this last one is easier to implement and it
seems
quoted
the one currently used (in mmconfig-shared.c )

Cheers

Gab
quoted
quoted
Have a look at drivers/pnp/system.c for PNP0c02
quoted
So probably I can use acpi_get_devices("PNP0C02",...) to
retrieve
quoted
quoted
it
quoted
quoted
from the quirk match function, I will look into this...
quoted
quoted
On the other side, since this is an exception only for the
config
quoted
quoted
quoted
quoted
space address of our host controller (as said before all
the
quoted
quoted
buses
quoted
quoted
quoted
quoted
below the root one support ECAM), I think that it is right
to
quoted
quoted
put
quoted
quoted
quoted
quoted
this address as a device specific data (in fact the rest of
the
quoted
quoted
quoted
quoted
quoted
quoted
config space addresses will be parsed from MCFG).
A kernel with no support for your host controller (and thus
no
quoted
quoted
quoted
quoted
quoted
knowledge of its _DSD) should still be able to operate the
rest
quoted
quoted
of the
quoted
quoted
quoted
system correctly.  That means we must have a generic way to
learn
quoted
quoted
what
quoted
quoted
quoted
address space is consumed by the host controller so we don't
try
quoted
quoted
to
quoted
quoted
quoted
assign it to other devices.
This is something I don't understand much...
Are you talking about a scenario where we have a Kernel image
compiled
quoted
quoted
without our host controller support and running on our
platform?
quoted
quoted
quoted
I *think* the point here is that your host controller config
space
quoted
quoted
should be
quoted
reserved through PNP0c02 so that the kernel will reserve it
through
quoted
quoted
the
quoted
generic PNP0c02 driver even if your host controller driver (and
related
quoted
_DSD) is not supported in the kernel.
Right.  Assume you have two top-level devices:

  PNP0A03  PCI host bridge
    _CRS     describes windows
    ????     describes ECAM space consumed
  PNPxxxx  another ACPI device, currently disabled
    _PRS     specifies possible resource settings, may specify no
restrictions
    _SRS     assign resources and enable device
    _CRS     empty until device enabled

When the OS enables PNPxxxx, it must first assign resources to it
using _PRS and _SRS.  We evaluate _PRS to find out what the
addresses
quoted
quoted
PNPxxxx can support.  This tells us things like how wide the
address
quoted
quoted
decoder is, the size of the region required, and any alignment
restrictions -- basically the same information we get by sizing a
PCI
quoted
quoted
BAR.

Now, how do we assign space for PNPxxxx?  In a few cases, _PRS has
only a few specific possibilities, e.g., an x86 legacy serial port
that can be at 0x3f8 or 0x2f8.  But in general, _PRS need not
impose
quoted
quoted
any restrictions.

So in general the OS can use any space that can be routed to
PNPxxxx.
quoted
quoted
If there's an upstream bridge, it may define windows that restrict
the
quoted
quoted
possibilities.  But in this case, there *is* no upstream bridge, so
the possible choices are the entire physical address space of the
platform, except for other things that are already allocated: RAM,
the
quoted
quoted
_CRS settings for other ACPI devices, things reserved by the E820
table (at least on x86), etc.

If PNP0A03 consumes address space for ECAM, that space must be
reported *somewhere* so the OS knows not to place PNPxxxx there.
This
quoted
quoted
reporting must be generic (not device-specific like _DSD).  The
ACPI
quoted
quoted
core (not drivers) is responsible for managing this address space
because:

  a) the OS is not guaranteed to have drivers for all devices, and

  b) even it *did* have drivers for all devices, the PNPxxxx space
may
quoted
quoted
  be assigned before drivers are initialized.
quoted
I do not understand how PNP0c02 works, currently, by the way.

If I read x86 code correctly, the unassigned PCI bus resources
are
quoted
quoted
quoted
assigned in arch/x86/pci/i386.c (?)
fs_initcall(pcibios_assign_resources),
quoted
with a comment:

/**
 * called in fs_initcall (one below subsys_initcall),
 * give a chance for motherboard reserve resources
 */

Problem is, motherboard resources are requested through (?):

drivers/pnp/system.c

which is also initialized at fs_initcall, so it might be called
after
quoted
quoted
quoted
core x86 code reassign resources, defeating the purpose PNP0c02
was
quoted
quoted
quoted
designed for, namely, request motherboard regions before
resources
quoted
quoted
quoted
are assigned, am I wrong ?
I think you're right.  This is a long-standing screwup in Linux.
IMHO, ACPI resources should be parsed and reserved by the ACPI
core,
quoted
quoted
before any PCI resource management (since PCI host bridges are
represented in ACPI).  But historically PCI devices have enumerated
before ACPI got involved.  And the ACPI core doesn't really pay
attention to _CRS for most devices (with the exception of PNP0C02).

IMO the PNP0C02 code in drivers/pnp/system.c should really be done
in
quoted
quoted
the ACPI core for all ACPI devices, similar to the way the PCI core
reserves BAR space for all PCI devices, even if we don't have
drivers
quoted
quoted
for them.  I've tried to fix this in the past, but it is really a
nightmare to unravel everything.

Because the ACPI core doesn't reserve resources for the _CRS of all
ACPI devices, we're already vulnerable to the problem of placing a
device on top of another ACPI device.  We don't see problems
because
quoted
quoted
on x86, at least, most ACPI devices are already configured by the
BIOS
quoted
quoted
to be enabled and non-overlapping.  But x86 has the advantage of
having extensive test coverage courtesy of Windows, and as long as
_CRS has the right stuff in it, we at least have the potential of
fixing problems in Linux.

If the platform doesn't report resource usage correctly on ARM, we
may
quoted
quoted
not find problems (because we don't have the Windows test suite)
and
quoted
quoted
if we have resource assignment problems because _CRS is lacking,
we'll
quoted
quoted
have no way to fix them.
quoted
As per last Tomasz's patchset, we claim and assign unassigned PCI
resources upon ACPI PCI host bridge probing (which happens at
subsys_initcall time, courtesy of ACPI current code); at that
time
quoted
quoted
the
quoted
kernel did not even register the PNP0c02 driver
(drivers/pnp/system.c)
quoted
(it does that at fs_initcall). On the other hand, we insert MCFG
regions into the resource tree upon MCFG parsing, so I do not
see why we need to rely on PNP0c02 to do that for us (granted,
the
quoted
quoted
quoted
mechanism is part of the PCI fw specs, which are x86 centric
anyway
quoted
quoted
quoted
ie we can't certainly rely on Int15 e820 to detect reserved
memory
quoted
quoted
quoted
on ARM :D)

There is lots of legacy x86 here and Bjorn definitely has more
visibility into that than I have, the ARM world must understand
how this works to make sure we have an agreement.
As you say, there is lots of unpleasant x86 legacy here.  Possibly
ARM
quoted
quoted
has a chance to clean this up and do it more sanely; I'm not sure
whether it's feasible to reverse the ACPI/PCI init order there or
not.
quoted
quoted
Rafael, any thoughts on this whole thing?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-
pci" in
quoted
quoted
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help