Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver
From: Maxime Ripard <hidden>
Date: 2020-02-11 12:39:06
Also in:
linux-arm-kernel, linux-iommu
Hi Robin, On Mon, Jan 27, 2020 at 07:01:02PM +0000, Robin Murphy wrote:
quoted
quoted
quoted
+static void sun50i_iommu_domain_free(struct iommu_domain *domain) +{ + struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain); + struct sun50i_iommu *iommu = sun50i_domain->iommu; + unsigned long flags; + int i; + + spin_lock_irqsave(&sun50i_domain->dt_lock, flags); + + for (i = 0; i < NUM_DT_ENTRIES; i++) { + phys_addr_t pt_phys; + u32 *page_table; + u32 *dte_addr; + u32 dte; + + dte_addr = &sun50i_domain->dt[i]; + dte = *dte_addr; + if (!sun50i_dte_is_pt_valid(dte)) + continue; + + memset(dte_addr, 0, sizeof(*dte_addr)); + sun50i_table_flush(sun50i_domain, virt_to_phys(dte_addr), 1);This shouldn't be necessary - freeing a domain while it's still live is an incredibly very wrong thing to do, so the hardware should have already been programmed to no longer walk this table by this point.We never "garbage collect" and remove the dte for the page table we don't use anymore elsewhere though, so couldn't we end up in a situation where we don't have a page table (because it has been freed) at the other end of our dte, but the IOMMU doesn't know about it since we never flushed?Let me reiterate: at the point of freeing, the assumption is that this domain should be long dissociated from the hardware. Any actual invalidation should have already happened at the point the TTB was disabled or pointed to some other domain, therefore fiddling with pagetable pages just before you free them back to the kernel is just pointless busywork. If the TTB *was* still live here, then as soon as you call free_pages() below the DT memory could get reallocated by someone else and filled with data that happens to look like valid pagetables, so even if you invalidate the TLBs the hardware could just immediately go walk that data and refill them with nonsense, thus any pretence at invalidation is in vain anyway.
Thanks, that makes a lot of sense.
The fly in the soup, however, is that default domains appear to be lacking proper detach notifications (I hadn't consciously noticed that before), so if you've been looking at the iommu_group_release() path it might have given the wrong impression. So what might be justifiable here is to check if the domain being freed is the one currently active in hardware, and if so perform a detach (i.e. disable the TTB and invalidate everything) first, then free everything as normal. Or just handwave that we essentially never free a default domain anyway so it's OK to just assume that we're not freeing anything live.
I'm still a bit unsure as of what to do exactly here. I haven't found a hook that would be called when a given domain doesn't have any devices attached to it. Or did you mean that I should just create a separate function, not part of the IOMMU ops?
quoted
quoted
quoted
+ + if (iommu->domain == domain) + return 0; + + if (iommu->domain) + sun50i_iommu_detach_device(iommu->domain, dev); + + iommu->domain = domain; + sun50i_domain->iommu = iommu; + + return pm_runtime_get_sync(iommu->dev);Deferring all the actual hardware pogramming to the suspend/resume hooks is a fiendishly clever idea that took me more than a moment to make sense of, but how well does it work when RPM is compiled out or runtime-inhibited?We have a bunch of other controllers that require runtime_pm already, so it's going to be enabled. But that should be expressed in Kconfig.But it can still be inhibited from sysfs, right? We don't want driver behaviour to be *unnecessarily* fragile to user actions, however silly they may be.
That's a good point :)
quoted
quoted
Furthermore, basing RPM on having a domain attached means that you'll effectively never turn the IOMMU off, even when all the clients are idle. It would make more sene to use device links like most other drivers do to properly model the producer/consumer relationship.I'm not familiar with device links for runtime_pm, I thought this was only useful for system-wide resume and suspend?See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full advantage of this.
I'll look into it, thanks!
quoted
quoted
quoted
+static int __maybe_unused sun50i_iommu_resume(struct device *dev) +{ + struct sun50i_iommu_domain *sun50i_domain; + struct sun50i_iommu *iommu; + unsigned long flags; + dma_addr_t dt_dma; + int ret; + + iommu = dev_get_drvdata(dev); + if (!iommu->domain) + return 0; + + sun50i_domain = to_sun50i_domain(iommu->domain); + dt_dma = dma_map_single(dev, sun50i_domain->dt, DT_SIZE, DMA_TO_DEVICE);As above. The power state of the IOMMU should be enitrely irrelevant to the contents of RAM.Sorry, I should have put a comment here. I'm not quite sure what the difference between a group and domain in the IOMMU framework is, but since this IOMMU can only deal with a single address space, my understanding was that we'd need to allocate a single domain and group, and that the domain was the abstraction tied to an address space (since it's what is passed as an argument to map).That's correct, a domain is effectvely an address space, while groups represents sets of devices that the IOMMU can isolate from each other. IOMMUs like this one (and the MediaTek M4U in mtk_iommu.c) are a little hard-done-by in that they do actually have some finer-grained isolation on a basic allow/deny level, but the API really assumes that isolation happens at the address space level, so it's easier to ignore it and just use the single-group model anyway. The really neat advantage of having a guaranteed single group, though, is that you then no longer need to care about address spaces: since the group can only ever be attached to one domain at a time, you can have as many domains as you like, and handle it by having the first attach_dev call on a given domain context-switch that pagetable into the hardware. That's more or less what you've done already, which is good, it would just benefit from that context-switching being done in a more robust and obvious manner :)
Got it, thanks :)
quoted
So, given this, what made since was to allocate the directory table buffer at domain_alloc time and map it. But then, domain_alloc seems to not have any pointer back to the iommu we registered for some reason (I guess that a domain could be shared across multiple IOMMUs?), and so we don't have access to our IOMMU's struct device.I'll spare you the unrpoductive "Robin complains bitterly about the iommu_domain_alloc() interface being terrible, episode #27"... You'll see two main ways that existing drivers work around that - if you're happy to assume that you'll only ever have one IOMMU instance, or that all instances will always be functionally equal, then you can simply keep track of any old IOMMU device handle for DMA purposes (e.g. exynos_iommu); otherwise if you might need to cope with multiple IOMMU instances having different DMA capabilities then deferring instance-specific setup to the first device attach is the de-facto standard (e.g. arm-smmu).
I don't have any idea on how it's going to evolve, and the latter seems cleaner, I'll work on that.
quoted
Also, a more general question. One of the cleanups I wanted to do was to remove the kmem_cache in favour of a dma_pool, which triggered that test. It looks like with a dma_pool, the physical address and dma address are not the same, even though the IOMMU is directly connected to the RAM so there should be no intermediate mapping. Do you know why?DMA pools are backed by dma_alloc_coherent, so (at least on arm64) the virtual address you get will be a non-cacheable remap (assuming a non-coherent device), and thus calling virt_to_phys() on it is bogus and will give you nonsense.
Going further off-topic, why do we need a remap instead of a regular physical address? Thanks! Maxime
Attachments
- signature.asc [application/pgp-signature] 228 bytes