Thread (14 messages) 14 messages, 4 authors, 2020-02-11

Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver

From: Maxime Ripard <hidden>
Date: 2020-02-11 12:39:06
Also in: linux-arm-kernel, linux-iommu

Hi Robin,

On Mon, Jan 27, 2020 at 07:01:02PM +0000, Robin Murphy wrote:
quoted
quoted
quoted
+static void sun50i_iommu_domain_free(struct iommu_domain *domain)
+{
+	struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
+	struct sun50i_iommu *iommu = sun50i_domain->iommu;
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&sun50i_domain->dt_lock, flags);
+
+	for (i = 0; i < NUM_DT_ENTRIES; i++) {
+		phys_addr_t pt_phys;
+		u32 *page_table;
+		u32 *dte_addr;
+		u32 dte;
+
+		dte_addr = &sun50i_domain->dt[i];
+		dte = *dte_addr;
+		if (!sun50i_dte_is_pt_valid(dte))
+			continue;
+
+		memset(dte_addr, 0, sizeof(*dte_addr));
+		sun50i_table_flush(sun50i_domain, virt_to_phys(dte_addr), 1);
This shouldn't be necessary - freeing a domain while it's still live is an
incredibly very wrong thing to do, so the hardware should have already been
programmed to no longer walk this table by this point.
We never "garbage collect" and remove the dte for the page table we
don't use anymore elsewhere though, so couldn't we end up in a
situation where we don't have a page table (because it has been freed)
at the other end of our dte, but the IOMMU doesn't know about it since
we never flushed?
Let me reiterate: at the point of freeing, the assumption is that this
domain should be long dissociated from the hardware. Any actual invalidation
should have already happened at the point the TTB was disabled or pointed to
some other domain, therefore fiddling with pagetable pages just before you
free them back to the kernel is just pointless busywork.

If the TTB *was* still live here, then as soon as you call free_pages()
below the DT memory could get reallocated by someone else and filled with
data that happens to look like valid pagetables, so even if you invalidate
the TLBs the hardware could just immediately go walk that data and refill
them with nonsense, thus any pretence at invalidation is in vain anyway.
Thanks, that makes a lot of sense.
The fly in the soup, however, is that default domains appear to be lacking
proper detach notifications (I hadn't consciously noticed that before), so
if you've been looking at the iommu_group_release() path it might have given
the wrong impression. So what might be justifiable here is to check if the
domain being freed is the one currently active in hardware, and if so
perform a detach (i.e. disable the TTB and invalidate everything) first,
then free everything as normal. Or just handwave that we essentially never
free a default domain anyway so it's OK to just assume that we're not
freeing anything live.
I'm still a bit unsure as of what to do exactly here. I haven't found
a hook that would be called when a given domain doesn't have any
devices attached to it. Or did you mean that I should just create a
separate function, not part of the IOMMU ops?
quoted
quoted
quoted
+
+	if (iommu->domain == domain)
+		return 0;
+
+	if (iommu->domain)
+		sun50i_iommu_detach_device(iommu->domain, dev);
+
+	iommu->domain = domain;
+	sun50i_domain->iommu = iommu;
+
+	return pm_runtime_get_sync(iommu->dev);
Deferring all the actual hardware pogramming to the suspend/resume hooks is
a fiendishly clever idea that took me more than a moment to make sense of,
but how well does it work when RPM is compiled out or runtime-inhibited?
We have a bunch of other controllers that require runtime_pm already,
so it's going to be enabled. But that should be expressed in Kconfig.
But it can still be inhibited from sysfs, right? We don't want driver
behaviour to be *unnecessarily* fragile to user actions, however silly they
may be.
That's a good point :)
quoted
quoted
Furthermore, basing RPM on having a domain attached means that
you'll effectively never turn the IOMMU off, even when all the
clients are idle. It would make more sene to use device links like
most other drivers do to properly model the producer/consumer
relationship.
I'm not familiar with device links for runtime_pm, I thought this was
only useful for system-wide resume and suspend?
See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full
advantage of this.
I'll look into it, thanks!
quoted
quoted
quoted
+static int __maybe_unused sun50i_iommu_resume(struct device *dev)
+{
+	struct sun50i_iommu_domain *sun50i_domain;
+	struct sun50i_iommu *iommu;
+	unsigned long flags;
+	dma_addr_t dt_dma;
+	int ret;
+
+	iommu = dev_get_drvdata(dev);
+	if (!iommu->domain)
+		return 0;
+
+	sun50i_domain = to_sun50i_domain(iommu->domain);
+	dt_dma = dma_map_single(dev, sun50i_domain->dt, DT_SIZE, DMA_TO_DEVICE);
As above. The power state of the IOMMU should be enitrely irrelevant to the
contents of RAM.
Sorry, I should have put a comment here.

I'm not quite sure what the difference between a group and domain in
the IOMMU framework is, but since this IOMMU can only deal with a
single address space, my understanding was that we'd need to allocate
a single domain and group, and that the domain was the abstraction
tied to an address space (since it's what is passed as an argument to
map).
That's correct, a domain is effectvely an address space, while groups
represents sets of devices that the IOMMU can isolate from each other.
IOMMUs like this one (and the MediaTek M4U in mtk_iommu.c) are a little
hard-done-by in that they do actually have some finer-grained isolation on a
basic allow/deny level, but the API really assumes that isolation happens at
the address space level, so it's easier to ignore it and just use the
single-group model anyway.

The really neat advantage of having a guaranteed single group, though, is
that you then no longer need to care about address spaces: since the group
can only ever be attached to one domain at a time, you can have as many
domains as you like, and handle it by having the first attach_dev call on a
given domain context-switch that pagetable into the hardware. That's more or
less what you've done already, which is good, it would just benefit from
that context-switching being done in a more robust and obvious manner :)
Got it, thanks :)
quoted
So, given this, what made since was to allocate the directory table
buffer at domain_alloc time and map it. But then, domain_alloc seems
to not have any pointer back to the iommu we registered for some
reason (I guess that a domain could be shared across multiple
IOMMUs?), and so we don't have access to our IOMMU's struct device.
I'll spare you the unrpoductive "Robin complains bitterly about the
iommu_domain_alloc() interface being terrible, episode #27"...

You'll see two main ways that existing drivers work around that - if you're
happy to assume that you'll only ever have one IOMMU instance, or that all
instances will always be functionally equal, then you can simply keep track
of any old IOMMU device handle for DMA purposes (e.g. exynos_iommu);
otherwise if you might need to cope with multiple IOMMU instances having
different DMA capabilities then deferring instance-specific setup to the
first device attach is the de-facto standard (e.g. arm-smmu).
I don't have any idea on how it's going to evolve, and the latter
seems cleaner, I'll work on that.
quoted
Also, a more general question. One of the cleanups I wanted to do was
to remove the kmem_cache in favour of a dma_pool, which triggered that
test. It looks like with a dma_pool, the physical address and dma
address are not the same, even though the IOMMU is directly connected
to the RAM so there should be no intermediate mapping. Do you know
why?
DMA pools are backed by dma_alloc_coherent, so (at least on arm64) the
virtual address you get will be a non-cacheable remap (assuming a
non-coherent device), and thus calling virt_to_phys() on it is bogus and
will give you nonsense.
Going further off-topic, why do we need a remap instead of a regular
physical address?

Thanks!
Maxime

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help