Thread (10 messages) 10 messages, 2 authors, 2017-03-01

Re: [PATCH 12/12] MIPS: PCI: Fix IP27 for the PCI_PROBE_ONLY case

From: Joshua Kinard <hidden>
Date: 2017-02-28 00:26:11
Also in: linux-pci

On 02/27/2017 11:36, Bjorn Helgaas wrote:
On Sat, Feb 25, 2017 at 04:34:12AM -0500, Joshua Kinard wrote:
quoted
On 02/24/2017 13:38, Bjorn Helgaas wrote:
quoted
On Fri, Feb 24, 2017 at 03:50:26AM -0500, Joshua Kinard wrote:
quoted
quoted
The host bridge code, i.e., the pcibios_scanbus() path, tells the
core how big the window is.  In this case, "hose->mem_resource"
contains the CPU physical address range (and size), and
"host->mem_offset" contains the offset between the CPU physical
address and the PCI bus address.  So if "hose->mem_resource" is
only 1GB, that's all the PCI core will use.
So basically, this bit of code from the proposed pci-bridge.c needs
additional work:

	bc->mem.name = "Bridge MEM";
	bc->mem.start = (NODE_SWIN_BASE(nasid, widget_id) + PCIBR_OFFSET_MEM);
	bc->mem.end = (NODE_SWIN_BASE(nasid, widget_id) + PCIBR_OFFSET_IO - 1);
	bc->mem.flags = IORESOURCE_MEM;

	bc->io.name = "Bridge IO";
	bc->io.start = (NODE_SWIN_BASE(nasid, widget_id) + PCIBR_OFFSET_IO);
	bc->io.end = (NODE_SWIN_BASE(nasid, widget_id) + PCIBR_OFFSET_END - 1);
	bc->io.flags = IORESOURCE_IO;

The values I've been using for PCIBR_OFFSET_MEM and PCIBR_OFFSET_IO
are obviously wrong.  For IP30, they ultimately direct "BRIDGE MEM"
to the first five device slots via small windows, and then "Bridge
IO" to the last two slots.  This is one of those "luck" moments that
happened to work on BaseIO, but is probably why devices in the
secondary PCI shoebox are hit-or-miss.

Per your comment further down that I can tell the PCI core about
//multiple// windows, probably what I want to do is hardwire access
to the small window spaces in the main BRIDGE driver, with
machine-specific offsets passed via platform_data.  Small windows,
for both IP30 and IP27, are always guaranteed to be available
without any additional magic.  This window range can be used to do
the initial slot probes to query devices for their desired BAR
values.

Then, in IP30-specific or IP27-specific code, we add in additional
memory windows as needed.  IP30 would simply pass along the
hardwired address ranges for BRIDGE MEM or BRIDGE IO via both medium
and big window addresses.

IP27 is trickier -- we'll need some logic that attempts to use a
small window (16MB) mapping first, but if that is too small, we'll
have to setup an entry in the HUB's IOTTE buffer that maps a
specific block of physical memory to Crosstalk address space.  
pcibios_scanbus() builds the &resources list of host bridge windows,
then calls pci_scan_root_bus().  If you're proposing to change the
list of host bridge windows *after* calling pci_scan_root_bus(), that
sounds a little problematic because we do not have a PCI core
interface to do that.
I believe I can determine the size of the windows on BRIDGE before hitting the
generic PCI code.  The bridge_probe() function calls register_pci_controller()
at the very bottom, even in the current mainline file (pci-ip27.c), after it's
twiddled a number of BRIDGE register bits.  I'll first have to try and
understand the 'pre_enable' code from the IP30 patch that changes the window
mappings in ip30-bridge.c, then find a way to move that code to execute earlier
before any of the generic PCI code, since it contains some of the logic to size
out the window mappings.

I'll see about getting it to work on IP30 (Octane) first, because all three
window spaces are available without any special magic.  That way, I'll know
what can be done in the generic BRIDGE driver and what needs to happen in
platform-specific files.  That'll make figuring out the IP27 logic a lot
easier....I hope.

quoted
This is what IP27 calls "big windows", of which there are seven
entries maximum, on 512MB boundaries, per HUB chip.  However, only
six entries are available because one IOTTE is used to handle a
known hardware bug on early HUB revisions, leaving only six left.

As such, the additional window information will go into the
machine-specific BRIDGE glue drivers (ip30-bridge.c or
ip27-bridge.c).

Sound sane?
Do you need to allocate the big windows based on what PCI devices you
find?  If so, that is going to be a hard problem.
I believe so.  The IOTTE stuff in the HUB ASIC on IP27 appears to behave like
an extremely simplified TLB, and once you work out how much space a given PCI
device needs, then you can calculate out the matching Crosstalk address, and
you write that to an unused IOTTE slot.  It doesn't have to be accurate -- each
IOTTE entry needs to be aligned on a 512MB boundary, so once you've set up a
mapping, if another PCI device (or BAR?) needs a Crosstalk address within an
already-mapped boundary, you just point it to the existing IOTTE entry that
matches.

That's my understanding anyways after looking at really old IA64 code in 2.5.70
that used virtually the same hardware when SGI was bringing up the Altix
platform.  Specifically, hub_piomap_alloc() on Line 117 in
Linux-2.5.70/arch/ia64/sn/io/io.c:
https://git.linux-mips.org/cgit/ralf/linux.git/tree/arch/ia64/sn/io/io.c?h=linux-2.5.70

So I just need to make sure all of this happens during the initial
bridge_probe() function, so that everything is ready before calling
register_pci_controller().  We'll find out!

quoted
quoted
PCIe functions have 4K of config space each, so a 0x1000 offset
makes sense.  Whether a platform can access all of it is a
separate question.
No need to worry about PCIe on this platform.  You're talking about
15+-year old hardware that was built to last.  I believe SGI only
implemented up to PCI 2.1 in the BRIDGE and XBRIDGE ASICs.  It's
possible the PIC ASIC might do 2.2 or 3.0.  But I don't have to
worry about PIC for a good while, as that's only found in the
IP35-class of hardware (Origin 300/350/3000, Fuel, Tezro, Onyx4,
etc).  We need fully-working IP27 and IP30 first, because much of
the working logic can then be re-purposed for bringing up IP35
hardware.  Especially since IA64 did a lot of the work already for
the Altix systems, which is just IP35 with an IA64 CPU, but I
digress...
OK.  If you only plug in PCI devices, there should be no problem.  If
you did plug in PCIe devices (via a PCI-to-PCIe bridge, for example),
the core should still enumerate them correctly and they should be
functional, though some PCIe-only features like AER won't work.
There might be physical limitations on using any kind of PCIe bridge device.
Getting a PCI device into an SGI Octane (IP30) or Origin 2000/Onyx2 (IP27) that
isn't either the BaseIO on IP30 or IO6 on IP27 requires an XIO Shoehorn or PCI
Shoebox.  There's a little bit of leeway on how wide of a PCI card you can
stuff into either one, but it's not much.  I'll have to measure a shoehorn if I
can find one of my spares in a bin someplace.  The Origin 200 tower machine
might let you get away with it...but that's a corner case in my book and not
really worth worrying about it.  It's hard to find one of those completely
intact these days, let alone with a working power supply that doesn't double as
a fireworks display.

quoted
quoted
The type 1 config accessors (pci_conf1_read_config(),
pci_conf1_write_config()) are for devices below a PCI-to-PCI
bridge.  It looks like there's a single 4K memory-mapped window,
and you point it at a specific bus & device with
bridge->b_pci_cfg.
Yeah, PCI-to-PCI bridges are a known-broken item right now.  Since
there appears to only be a single Type 1 config space, if I have a
PCI Shoebox with three PCI-X slots, and I stick three USB 2.0 boards
in them, I have to pick which one can handle PCI-to-PCI bridges,
like USB hubs, and program that into this Type 1 register, right?
I don't think USB hubs are relevant here because they are completely
in the USB domain.  The USB 2.0 board is a USB host controller with a
PCI interface on it.  The USB hub is on the USB side and is invisible
to the PCI core.

The way I read the BSD code, it looks like you should be able to use
the Type 1 config interface to generate accesses to any PCI bus, as
long as you serialize them, e.g., with a lock around the use of
b_type1_cfg.
Huh, well, I've never actually tried a USB Hub device on my Octane.  If I can
fix PCI up properly and get a 2.0 card to actually work, I'll try it out before
looking too deeply into the Type 1 stuff.

quoted
quoted
The Linux type 0 accessors look like they support 4K config space
per function, while the type 1 accessors put the function number
in bits 8-10, so it looks like they only support 256 bytes per
function.

The OpenBSD accessors (xbridge_conf_read() and
xbridge_conf_write()) look like they only support 256 bytes per
function regardless of whether it's type 0 or type 1.
I'll have to pass this question to the OpenBSD dev who maintains the
xbridge driver.  He understands this hardware far better than I do,
and there might be a reason why they do it that way.
If you only support PCI devices, 256 bytes of config space is enough.
The 4K config space is only supported for PCI-X Mode 2 and PCIe
devices.  Even those PCI-X Mode 2 and PCIe devices should be
functional with only 256 bytes.
It's unknown why SGI left that much space between the registers on the BRIDGE.
Could've been future-proofing, or maybe there really is a PCI or PCI-X device
out there that currently only works under IRIX that needed it all.  If I can
get things to work better than they are now, I've got a bit of a collection of
PCI devices to test things out with.  Might expose any wrong assumptions.

quoted
quoted
Supporting 4K of space for type 0 seems like a potential Linux
problem.  Also, pci_conf1_write_config() uses b_type0_cfg_dev in
one place where it looks like it should be using b_type1_cfg.
I didn't word this well.  I was trying to point out things that look
like bugs in pci_conf1_write_config() (the use of b_type0_cfg_dev) and
maybe pci_conf0_read_config() (the fact that it allows 4K config space
but the hardware probably only supports 256 bytes).
If you're referring to those uses in ops-bridge.c, that file probably needs
some additional TLC.  The patchset I've got fixed it up a little, but looking
at it made my head hurt (and not just because of the IOC3 voodoo in it), so I
kinda passed on spending too much time on it.  I'll probably have to double
back on it in the future after fixing the other areas, so I'll keep this in mind.

quoted
quoted
If the hardware only supports 256 bytes of config space on
non-root bus devices, that's not a disaster.  We should still be
able to enumerate them and use all the conventional PCI features
and even basic PCIe features.  But the extended config space
(offsets 0x100-0xfff) would be inaccessible, and we wouldn't see
any PCIe extended capabilities (AER, VC, SR-IOV, etc., see
PCI_EXT_CAP_ID_ERR and subsequent definitions) because they live
in that space.
I do not think that the BRIDGE supports AER, VC, or SR-IOV at all,
primarily due to age of the hardware.  Even if a PCI device that
does support those is plugged into a BRIDGE, I am not 100% certain
those features would even be usable if the BRIDGE doesn't know about
them.  Or is this a case of where the BRIDGE wouldn't care and once
it programs the BARs correctly, these features would be available to
the Linux drivers?
These are mostly PCIe features and I don't think you should worry
about them, at least for now.
Good to know then.  Hopefully never :)

quoted
I remember from the OpenBSD xbridge driver, it states that it
currently uses the BAR mappings setup by the ARCS firmware, but the
plan is to eventually move away from those mappings and calculate
its own.  I didn't quite get that, but after reading that site, now
I do.  ARCS is writing ~0 to the BARs to query the device to figure
out how much memory each BAR wants, then programs in some ranges for
us.

By tallying up the total requested size of all BAR mappings on a
given device, we can then make a determination on whether we can use
a small, medium, or large window and then program the BAR with the
correct address.  Right?
This is the problem I alluded to above: Linux does not currently
support anything like this.  We assume the Linux host bridge driver
knows the window sizes at the beginning of time (it may learn them
from firmware or it may have hard-coded knowledge of the address map).

Linux does enumerate the devices and tally up all the BAR sizes (see
pci_bus_assign_resources()), but it does not have a way to change the
host bridge windows based on how much BAR space we need.

The common thing is to use whatever host bridge windows and device BAR
values were set up by firmware.  If those are all sensible, Linux
won't change anything.
Well, as I stated earlier above, it looks like the bridge_probe() function in
the new driver will have to do some probing of its own.  One of the patches I
have in a different series does a minimal probe to read each device slot on
BRIDGE to get the vendor ID and device ID to see if the slot is actually
populated.  This is because not all slots are "populated"; e.g., Octane wires
one of the PCI interrupt pins on slot 6 on BaseIO to its power button, for example.

So the logic used to read the vendor/dev IDs can probably be extended to poke
the BARs via the ~0 trick OR read out the pre-defined mappings from ARCS
firmware, then use that info to size out the BRIDGE windows into Crosstalk
space and then set up our IORESOURCE_MEM and IORESOURCE_IO structs with the
right information to pass into register_pci_controller().

I dunno, I'll probably try reading the ARCS mappings idea first, as that seems
easier.  I won't have a lot of time to play with things in March, so getting
something to work, even if it is sub-optimal, is better than nothing working at
all.

quoted
quoted
The host bridge windows (the "root bus resource" lines in dmesg)
are only for PIO, i.e., a driver running on the CPU reading or
writing registers on the PCI device (it could be actual registers,
or a frame buffer, etc., but it resides on the device and its PCI
bus address is determined by a BAR).  The driver uses ioremap(),
pci_iomap(), pcim_iomap_regions(), etc. to map these into the
kernel virtual space.
So for the MIPS case, knowing that ioremap() and friends is not
guaranteed to return a workable virtual address, I need to be
careful of what addresses I program into each BAR.  E.g., given that
on IP30, if a physical address starts with 0x9000000f800xxxxx, it is
using medium windows to talk to a device on the BaseIO BRIDGE (Xtalk
widget 0xf).  As such, knowing MIPS' ioremap() returns, for the
R10000 CPU case, the requested address OR'ed with IO_BASE
(0x9000000000000000), I want to tell the PCI core to use
0x0000000f800xxxxx so that ioremap() will return back
0x9000000f800xxxxx?
If ioremap() doesn't return a virtual address, or at least something
that can be used like a virtual address, I think that's fundamentally
broken.  All drivers assume they should ioremap() a BAR resource,
i.e., the CPU physical address that maps to a PCI BAR value, and use
the result as a virtual address.
This is an issue that the core Linux/MIPS people will have to work out.  Ralf
probably knows the precise history on why ioremap*() and related functions
aren't guaranteed to be usable as virtual addresses.  I suspect it's tied to
the other issues with IP27, as that was one of the first MIPS platforms that
Linux ran on in the early 2000's.

quoted
And since I can apparently specify multiple window ranges for memory
space and I/O space, I probably want to, as stated earlier, specify
all three of IP30's known window ranges for each BRIDGE so that the
Linux PCI core can walk each resource struct and find a matching
window?
If all three windows are disjoint, you can specify them all.  If a
range in the small window is aliased and can also be reached via a
medium or large window, you should not specify both.
Each window, at least on Octane, lives at distinct addresses in Crosstalk
space.  A specific address within each window can point to the same device on a
subordinate PCI bus, but I think that will be transparent to the Linux PCI
core.  My thinking is, if I setup IORESOURCE_MEM structs for each window for
each BRIDGE widget, then the PCI code will check each struct to see which one
contains the address range that the PCI device's BAR wants.

E.g., Using widget 0xd as an example for readability, if I know my three
windows in Crosstalk space are:
    Small:  0x0000_1d00_0000 - 0x0000_1dff_ffff
    Medium: 0x000e_8000_0000 - 0x000e_ffff_ffff
    Large:  0x00d0_0000_0000 - 0x00df_ffff_ffff

And for that specific BRIDGE, if I pass those three ranges as IORESOURCE_MEM
structs, then shouldn't the PCI core, if it is told that a specific devices BAR
wants range 1d200000 to 1d203fff, select the small window mapping?  E,g., is it
going to try to find the smallest possible window to fit first?  Or will it
instead try to match using the large window?

I guess I'll find out when I can get some time to actually test the idea out...


[snip]
quoted
quoted
I think the biggest problem is that you need to set up the BRIDGE
*before* calling pcibios_scan_bus().  That way the windows
("hose->mem_resource", "hose->io_resource", etc.) will be correct
when the PCI core enumerates the devices.  Note that you can have
as many windows in the "&resources" list as you need -- the
current code there only has one memory and one I/O window, but you
can add more.
Agreed, however, I think this is actually being done to some extent
already.  We're configuring BRIDGE-specific properties and writing
some values to BRIDGE registers and the per-slot registers in
bridge_probe() in the generic pci-bridge.c driver, then, at the very
end, calling register_pci_controller(), which I believe is what
kicks off the PCI bus scan.

The IP30 BRIDGE glue code adds a new function hook called
"pre_enable" where it does, in ip30-bridge.c, some additional PCI
device tweaking, but this code will run after the PCI bus scan has
happened.  My guess is that I want to reverse this and have the IP30
glue code twiddle the PCI devices on a given BRIDGE before the PCI
bus scan, right?  
Theoretically you should not need to do any PCI device tweaking: PCI
devices are basically arch-independent and shouldn't need
arch-specific tweaks.  All the arch stuff should be encapsulated in
the host bridge driver, i.e., the code that sets up the window list
and calls pci_scan_root_bus().  In the MIPS case, this code is kind of
spread out and doesn't really look like a single "driver".
In an ideal world, this would be correct.  However, owning SGI equipment must
have an effect on the curvature of local spacetime, and things seem to operate
a bit differently.  The IOC3 device is a perfect example of this phenomenon.

The ideal thing would be if you can set up the bridge to always use
large windows.  Then the PCI core will enumerate the devices and set
up their resources automatically.
I'll keep this in mind when I can start testing.  I'll initially try feeding
three IORESOURCE_MEM structs to the PCI core to describe the three windows in
order from smallest to largest and see what it does.  Then try reversing the
order and see if it still finds the right window or not.  And whether that
actually leads to correct probing of the PCI devices or not.

I assume the small/medium windows exist because there's not enough
address space to always use the large windows.  I don't have any good
suggestions for that -- we don't have support for adjusting the window
sizes based on what devices we find.
IMHO, "Use a bigger hammer" seems most appropriate when it comes to this hardware.

quoted
That code uses pci_read_config_dword() and
pci_write_config_dword() -- are these functions safe to use if we
haven't already done a PCI bus scan?
You can't use pci_read_config_dword() before we scan the bus because
it requires a "struct pci_dev *", and those are created during the bus
scan.
Okay, that's good to know then.  There's other accessors that are MIPS-specific
that I think I can use.  I don't think I specifically need a 'struct pci_dev *'
at that point anyways.


-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help