Re: [PATCH v4 3/3] iommu: dart: Add DART iommu driver
From: Sven Peter <hidden>
Date: 2021-07-25 12:41:18
Also in:
linux-arm-kernel, linux-iommu, lkml
On Mon, Jul 19, 2021, at 20:15, Robin Murphy wrote:
On 2021-07-15 17:41, Sven Peter via iommu wrote: [...]quoted
quoted
quoted
+ u64 sw_bypass_cpu_start; + u64 sw_bypass_dma_start; + u64 sw_bypass_len; + + struct list_head streams;I'm staring to think this could just be a bitmap, in a u16 even.The problem is that these streams may come from two different DART instances. That is required for e.g. the dwc3 controller which has a weird quirk where DMA transactions go through two separate DARTs with no clear pattern (e.g. some xhci control structures use the first dart while other structures use the second one).Ah right, I do remember discussing that situation, but I think I misinterpreted dart_domain->dart representing "the DART instance" here to mean we weren't trying to accommodate that just yet.quoted
Both of them need to point to the same pagetable. In the device tree the node will have an entry like this: dwc3_0: usb@382280000{ ... iommus = <&dwc3_0_dart_0 0>, <&dwc3_0_dart_1 1>; }; There's no need for a linked list though once I do this properly with groups. I can just use an array allocated when the first device is attached, which just contains apple_dart* and streamid values.quoted
quoted
+ + spinlock_t lock; + + struct iommu_domain domain; +}; + +/* + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate + * and contains a list of streams bound to this device as defined in the + * device tree. Multiple DART instances can be attached to a single device + * and each stream is identified by its stream id. + * It's usually reference by a pointer called *cfg. + * + * A dynamic array instead of a linked list is used here since in almost + * all cases a device will just be attached to a single stream and streams + * are never removed after they have been added. + * + * @num_streams: number of streams attached + * @streams: array of structs to identify attached streams and the device link + * to the iommu + */ +struct apple_dart_master_cfg { + int num_streams; + struct { + struct apple_dart *dart; + u32 sid;Can't you use the fwspec for this?I'd be happy to use the fwspec code if that's somehow possible. I'm not sure how though since I need to store both the reference to the DART _and_ to the stream id. As far as I can tell the fwspec code would only allow to store the stream ids. (see also the previous comment regarding the dwc3 node which requires stream ids from two separate DART instances)Hmm, yes, as above I was overlooking that, although there are still various ideas that come to mind; the question becomes whether they're actually worthwhile or just too-clever-for-their-own-good hacks. The exact format of fwspec->ids is not fixed (other than the ACPI IORT code having a common understanding with the Arm SMMU drivers) so in principle you could munge some sort of DART instance index or indeed anything, but if it remains cleaner to manage your own data internally then by all means keep doing that.
Yeah, I can think of some hacks as well (like storing a global id->apple_dart* map or stuffing the 64bit pointer into two ints) and I've tried a few of them in the past days but didn't like either of them. I do like the idea to just put two (struct apple_dart *dart, u16 sidmap) in there though which will be plenty for all current configurations.
quoted
quoted
quoted
+ struct device_link *link;Is it necessary to use stateless links, or could you use DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually?I'll just use DL_FLAG_AUTOREMOVE_SUPPLIER. No idea why I went for stateless links.quoted
[...]quoted
quoted
+ /* restore stream identity map */ + writel(0x03020100, dart->regs + DART_STREAM_REMAP); + writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4); + writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8); + writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);Any hint of what the magic numbers mean?Yes, it's just 0,1,2,3...,0xe,0xf but I can't do 8bit writes to the bus and 32 bit writes then require these slightly awkward "swapped" numbers. I'll add a comment since it's not obvious at first glance.Sure, I guessed that much from "identity map" - it was more a question of why that means 0x03020100... rather than, say, 0x0c0d0e0f... or 0x76543210..., and perhaps the reason for "restoring" it in the first place.
So what this feature does is to allow the DART to take an incoming DMA stream tagged with id i and pretend that it's actually been tagged with readb(dart->regs + 0x80 + i) instead. That's as much as I can figure out by poking the hardware. More details are probably only available to Apple. Now the reason I thought I needed this was that I assumed we are handed these DARTs in an unclean state because Apple makes use of this internally: In their device tree they have a sid-remap property which I believe is a hack to make their driver simpler. The dwc3 controller requires stream 0 of dartA and stream 1 of dartB to be configured the same way. They configure dartB to remap stream 1 to stream 0 and then just mirror all MMIO writes from dartA to dartB and pretend that dwc3 only needs a single DART. As it actually turns out though, iBoot doesn't use the USB DARTs and we already get them in the sane state. I can just drop this code. (And if we actually need it for other DARTs I can also just restore those in our bootloader or add it in a follow up).
[...]quoted
quoted
quoted
+ /* + * we can't mix and match DARTs that support bypass mode with those who don't + * because the iova space in fake bypass mode generally has an offset + */Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should be exactly what it says, regardless of how it's implemented. If you can't provide a true identity mapping then you're probably better off not pretending to support them in the first place.Some background: the PCIe DART only supports a 32bit VA space but RAM on these machines starts at 0x8_0000_0000. I have something like dma-ranges = <0x42000000 0 0 0x8 0 0 0xffff0000>; in the pcie nodes to add that offset to dma addresses. What I want to do here then is to setup an identity mapping with respect to the DMA layer understanding of addresses encoded in bus_dma_region. Now this will always just be a constant offset of 0x8_0000_0000 for all M1s but I didn't want to hardcode that. The code here is just there to guard against a situation where someone somehow manages to attach two devices with different offsets to the same domain.Urgh, *now* I think I get it - the addressing limitation WRT the physical memory map layout had also slipped my mind. So you describe the RC *as if* it had a physical bus offset, rely on iommu-dma ignoring it when active (which is more by luck than design - we don't expect to ever see a device with a real hard-wired offset upstream of an IOMMU, although I did initially try to support it back in the very early days), and otherwise statically program a translation such that anyone else who *does* respect bus_dma_regions finds things work as expected.
Yes, exactly. It's not very nice but it works...
That actually seems like an even stronger argument for having the fake-bypass table belong to the DART rather than the domain, and at that point you shouldn't even need the mismatch restriction, since as long as you haven't described the fake offset for any devices who *can* achieve real bypass, then "attach to an identity domain" simply comes down to doing the appropriate thing for each individual stream, regardless of whether it's the same nominal identity domain that another device is using or a distinct one (it's highly unlikely that two groups would ever get attached to one identity domain rather than simply having their own anyway, but it is technically possible).
Agreed. That sounds a lot nicer actually.
quoted
If that's not how the abstraction is supposed to work and/or too big of a hack I'll just remove the software bypass mode altogether. PCIe won't work on 4k kernels then but the only people using this so far build their own kernels with patches either way and won't complain. And by the time Linux will actually be useful for "normal" setups the dma-iommu layer can hopefully just handle a larger page granularity.It's certainly... "creative", and TBH I don't hate it (in a "play the hand you've been given" kind of way), but the one significant downside is that if the DART driver isn't loaded for any reason, PCI DMA will look like it should be usable but then just silently (or not so silently) fail.
Good point!
FWIW if you do want to keep the option open, I'd be inclined to have the DT just give an "honest" description of just the 32-bit limitation, then have the DART driver's .probe_device sneakily modify the bus_dma_region to match the relevant fake-bypass table as appropriate. It's possible other folks might hate that even more though :D
I've given that one a try and I kinda like it so far :D I'll keep it for v5 and just drop it in case someone complains.
quoted
quoted
quoted
+ if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY && + (domain->dart->supports_bypass != dart->supports_bypass))) + return -EINVAL; + + list_for_each_entry(stream, &domain->streams, stream_head) { + if (stream->dart == dart && stream->sid == sid) { + stream->num_devices++; + return 0; + } + } + + spin_lock_irqsave(&dart->lock, flags); + + if (WARN_ON(dart->used_sids & BIT(sid))) { + ret = -EINVAL; + goto error; + } + + stream = kzalloc(sizeof(*stream), GFP_ATOMIC); + if (!stream) { + ret = -ENOMEM; + goto error; + }Couldn't you do this outside the lock? (If, calling back to other comments, it can't get refactored out of existence anyway)Probably, but I'll first see if I can just refactor it away.Actually I missed that we're already under dart_domain->lock at this point anyway, so it's not going to make much difference, but it does mean that the spin_lock_irqsave() above could just be spin_lock(), unless it's possible to relax the domain locking a bit such that we don't have to do the whole domain init with IRQs masked.
I can relax the locking quite a bit. Right now, I only need a spinlock around the TLB flush MMIO writes and a mutex to protect domain initialization.
[...]quoted
quoted
quoted
+static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) +{ + struct apple_dart_domain *dart_domain; + + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED && + type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED) + return NULL;I want to say there's not much point in that, but then I realise I've spent the last couple of days writing patches to add a new domain type :)Hah! Just because I'm curious: What is that new domain type going to be? :)Splitting IOMMU_DOMAIN_DMA into two to replace iommu_dma_strict being an orthogonal thing. [...]quoted
quoted
quoted
+static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args) +{ + struct platform_device *iommu_pdev = of_find_device_by_node(args->np); + struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); + unsigned int num_streams = cfg ? cfg->num_streams : 0; + struct apple_dart_master_cfg *cfg_new; + struct apple_dart *dart = platform_get_drvdata(iommu_pdev); + + if (args->args_count != 1) + return -EINVAL; + + cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1), + GFP_KERNEL); + if (!cfg_new) + return -ENOMEM; + + cfg = cfg_new; + dev_iommu_priv_set(dev, cfg); + + cfg->num_streams = num_streams; + cfg->streams[cfg->num_streams].dart = dart; + cfg->streams[cfg->num_streams].sid = args->args[0]; + cfg->num_streams++;Yeah, this is way too reminiscent of the fwspec code for comfort. Even if you can't use autoremove links for some reason, an array of 16 device_link pointers hung off apple_dart still wins over these little pointer-heavy structures if you need more than a few of them.I can get rid of the links, but I'll still need some way to store both the apple_dart and the sid here. Like mentioned above, I'll be happy to reuse the fwspec code but I don't see how yet.As before, if you can fit in some kind of DART instance identifier which isn't impractical to unpack than it makes sense to use the fwspec since it's already there. However if you still need to allocate something per-device rather than just stashing an existing pointer in iommu_priv, then you may as well keep everything together there. If the worst known case could still fit in just two DART pointers and two 64-bit bitmaps, I'd be inclined to just have that as a fixed structure and save all the extra bother - you're not cross-architecture like the fwspec code, and arm64's minimum kmalloc granularity has just gone back up to 128 bytes (but even at 64 bytes you'd have had plenty of room).
That's a very good point, I somehow tried to make this part as general as possible and didn't realize that this only has to work on essentially one SoC for now. I also don't expect Apple to require more than two DARTs for a single master in the future. I've tried the fixed structure now and I really like it so far.
[...]quoted
quoted
quoted
+static int apple_dart_remove(struct platform_device *pdev) +{ + struct apple_dart *dart = platform_get_drvdata(pdev); + + devm_free_irq(dart->dev, dart->irq, dart); + + iommu_device_unregister(&dart->iommu); + iommu_device_sysfs_remove(&dart->iommu); + + clk_bulk_disable(dart->num_clks, dart->clks); + clk_bulk_unprepare(dart->num_clks, dart->clks);Ditto. And again the bus ops are still installed - that'll get really fun if this is a module unload...Ugh, yeah. I'll fix that as well. I'll have to see how to make this work correctly with multiple DART instances. I guess I should only remove the bus ops once the last one is removed. Now that I think about it, this could also get tricky in the cleanup paths of apple_dart_probe. Maybe just add a module_init that sets up the bus ops when it finds at least one DART node and module_exit to tear them down again?Actually by this point it was late and I wasn't thinking as clearly as I could have been, apologies ;) I believe a module unload is in fact the *only* time you should expect to see .remove called - you want to set .suppress_bind_attrs in your driver data because there's basically no way to prevent manual unbinding from blowing up - so it should be fine to unconditionally clear the ops at this point (being removed means you must have successfully probed, so any ops must be yours).
Makes sense, thanks! I'll let my current version simmer for a bit and wait until it's been tested by a few people and will send it in a week or so then! Best, Sven