Thread (53 messages) 53 messages, 8 authors, 2017-11-28

[PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

From: Sricharan R <hidden>
Date: 2017-07-17 11:47:09
Also in: linux-arm-msm, linux-clk, linux-devicetree, linux-iommu, lkml

Hi,

On 7/15/2017 1:09 AM, Rob Clark wrote:
On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon [off-list ref] wrote:
quoted
On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
quoted
On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon [off-list ref] wrote:
quoted
On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
quoted
On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon [off-list ref] wrote:
quoted
On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
quoted
On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon [off-list ref] wrote:
quoted
On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
quoted
On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R [off-list ref] wrote:
quoted
Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
quoted
On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R [off-list ref] wrote:
quoted
Hi Vivek,

On 7/13/2017 10:43 AM, Vivek Gautam wrote:
quoted
Hi Stephen,


On 07/13/2017 04:24 AM, Stephen Boyd wrote:
quoted
On 07/06, Vivek Gautam wrote:
quoted
@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
                   size_t size)
  {
-    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+    size_t ret;
        if (!ops)
          return 0;
  -    return ops->unmap(ops, iova, size);
+    pm_runtime_get_sync(smmu_domain->smmu->dev);
Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.
That's something which was dropped in the following patch merged in master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?
 Apart from the locking, wonder why a explicit pm_runtime is needed
 from unmap. Somehow looks like some path in the master using that
 should have enabled the pm ?
Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().
 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
    then pm_runtime from the master would setup the all the clocks. That would
    avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
    the unmap can be called from atomic context, resume handler here should
    avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?
I'm not sure that a stalled state is what we're after here, because we need
to take care to prevent any table walks if we've freed the underlying pages.
What we could try to do is disable the SMMU (put into global bypass) and
invalidate the TLB when performing a suspend operation, then we just ignore
invalidation whilst the clocks are stopped and, on resume, enable the SMMU
again.
wouldn't stalled just block any memory transactions by device(s) using
the context bank?  Putting it in bypass isn't really a good thing if
there is any chance the device can sneak in a memory access before
we've taking it back out of bypass (ie. makes gpu a giant userspace
controlled root hole).
If it doesn't deadlock, then yes, it will stall transactions. However, that
doesn't mean it necessarily prevents page table walks.
btw, I guess the concern about pagetable walk is that the unmap could
have removed some sub-level of the pt that the tlb walk would hit?
Would deferring freeing those pages help?
Could do, but it sounds like a lot of complication that I think we can fix
by making the suspend operation put the SMMU into a "clean" state.
quoted
quoted
Instead of bypass, we
could configure all the streams to terminate, but this race still worries me
somewhat. I thought that the SMMU would only be suspended if all of its
masters were suspended, so if the GPU wants to come out of suspend then the
SMMU should be resumed first.
I believe this should be true.. on the gpu side, I'm mostly trying to
avoid having to power the gpu back on to free buffers.  (On the v4l2
side, somewhere in the core videobuf code would also need to be made
to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
Right, and we shouldn't have to resume it if we suspend it in a clean state,
with the TLBs invalidated.
I guess if the device_link() stuff ensured the attached device
(gpu/etc) was suspended before suspending the iommu, then I guess I
can't see how temporarily putting the iommu in bypass would be a
problem.  I haven't looked at the device_link() stuff too closely, but
iommu being resumed first and suspended last seems like the only thing
that would make sense.  I'm mostly just nervous about iommu in bypass
vs gpu since userspace has so much control over what address gpu
writes to / reads from, so getting it wrong w/ the iommu would be a
rather bad thing ;-)
Right, but we can also configure it to terminate if you don't want bypass.
 But one thing here is, with devicelinks in picture, iommu suspend/resume
 is called along with the master. That means, we can end up cleaning even
 active entries on the suspend path ?, if suspend is going to
 put the smmu in to a clean state every time. So if the master's are following
 the pm_runtime sequence before a dma_map/unmap operation, that seems better.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help