Thread (3 messages) 3 messages, 3 authors, 2018-01-30

Re: [PATCH 3/4] drm: add ARM flush implementations

From: Daniel Vetter <hidden>
Date: 2018-01-18 07:42:55
Also in: dri-devel

Possibly related (same subject, not in this thread)

On Wed, Jan 17, 2018 at 11:46 PM, Gurchetan Singh
[off-list ref] wrote:
quoted
dma api just isn't quite sufficient for implementing fast gpu drivers.

Can you elaborate?  IIRC the DMA API has strong synchronization guarantees
and that can be problematic for GPU drivers.  However, using the DMA API for
flushing doesn't necessarily mean the driver has to use the rest of the DMA
API.
quoted
but then it'd need to directly flush cpu caches and bypass the dma api.

On ARM, cache flushing seems to vary from chipset to chipset.  For example,
on ARM32 a typical call-stack for dma_sync_single_for_device looks like:

arm_dma_sync_single_for_device
__dma_page_cpu_to_dev
outer_clean_range
outer_cache.clean_range

There are multiple clean_range implementations out there (i.e,
aurora_clean_range, l2c210_clean_range, feroceon_l2_clean_range), so that's
why the DMA API was used in this case.  On ARM64, things are a little
simpler, but the DMA API seems to go directly to assembly (__dma_map_area)
after a little indirection.  Why do you think that's inefficient?
I never said it's inefficient. My only gripe is with adding the
pointless struct device * argument to flushing functions which really
don't care (nor should care) about the device. Because what we really
want to call is outer_clean_range here, and that doesn't need a struct
device *. Imo that function (or something similar) needs to be
exported and then used by drm_flush_* functions.

Also note that arm has both flush and invalidate functions, but on x86
those are the same implementation (and we don't have a separate
drm_invalidate_* set of functions). That doesn't look like a too good
idea.

Of course that doesn't solve the problem of who's supposed to call it
and when in the dma-buf sharing situation.
-Daniel
On Wed, Jan 17, 2018 at 12:31 AM, Daniel Vetter [off-list ref] wrote:
quoted
On Tue, Jan 16, 2018 at 04:35:58PM -0800, Gurchetan Singh wrote:
quoted
The DMA API can be used to flush scatter gather tables and physical
pages on ARM devices.

Signed-off-by: Gurchetan Singh <redacted>
---
 drivers/gpu/drm/drm_cache.c                 | 17 +++++++++++++++++
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c |  7 ++-----
 drivers/gpu/drm/tegra/gem.c                 |  6 +-----
 3 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 3d2bb9d71a60..98d6ebb40e96 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -105,6 +105,18 @@ drm_flush_pages(struct device *dev, struct page
*pages[],
                                 (unsigned long)page_virtual +
PAGE_SIZE);
              kunmap_atomic(page_virtual);
      }
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+     unsigned long i;
+     dma_addr_t dma_handle;
+
+     if (!dev)
+             return;
+
+     for (i = 0; i < num_pages; i++) {
+             dma_handle = phys_to_dma(drm->dev,
page_to_phys(pages[i]));
+             dma_sync_single_for_device(dev, dma_handle, PAGE_SIZE,
+                                        DMA_TO_DEVICE);
Erm no. These functions here are super-low-level functions used by drivers
which know exactly what they're doing. Which is reimplementing half of the
dma api behind the dma api's back because the dma api just isn't quite
sufficient for implementing fast gpu drivers.

If all you end up doing is calling the dma api again, then pls just call
it directly.

And just to make it clear: I'd be perfectly fine with adding arm support
here, but then it'd need to directly flush cpu caches and bypass the dma
api. Otherwise this is pointless.
-Daniel
quoted
+     }
 #else
      pr_err("Architecture has no drm_cache.c support\n");
      WARN_ON_ONCE(1);
@@ -136,6 +148,11 @@ drm_flush_sg(struct device *dev, struct sg_table
*st)

      if (wbinvd_on_all_cpus())
              pr_err("Timed out waiting for cache flush\n");
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+     if (!dev)
+             return;
+
+     dma_sync_sg_for_device(dev, st->sgl, st->nents, DMA_TO_DEVICE);
 #else
      pr_err("Architecture has no drm_cache.c support\n");
      WARN_ON_ONCE(1);
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 8ac7eb25e46d..0157f90b5d10 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -14,6 +14,7 @@

 #include <drm/drm.h>
 #include <drm/drmP.h>
+#include <drm/drm_cache.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_vma_manager.h>
 #include <linux/iommu.h>
@@ -99,15 +100,11 @@ static int rockchip_gem_get_pages(struct
rockchip_gem_object *rk_obj)
      /*
       * Fake up the SG table so that dma_sync_sg_for_device() can be
used
       * to flush the pages associated with it.
-      *
-      * TODO: Replace this by drm_flush_sg() once it can be implemented
-      * without relying on symbols that are not exported.
       */
      for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i)
              sg_dma_address(s) = sg_phys(s);

-     dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl,
rk_obj->sgt->nents,
-                            DMA_TO_DEVICE);
+     drm_flush_sg(drm->dev, rk_obj->sgt);

      return 0;
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index ab1e53d434e8..9945fd2f6bd6 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -230,15 +230,11 @@ static int tegra_bo_get_pages(struct drm_device
*drm, struct tegra_bo *bo)
      /*
       * Fake up the SG table so that dma_sync_sg_for_device() can be
used
       * to flush the pages associated with it.
-      *
-      * TODO: Replace this by drm_clflash_sg() once it can be
implemented
-      * without relying on symbols that are not exported.
       */
      for_each_sg(bo->sgt->sgl, s, bo->sgt->nents, i)
              sg_dma_address(s) = sg_phys(s);

-     dma_sync_sg_for_device(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-                            DMA_TO_DEVICE);
+     drm_flush_sg(drm->dev, bo->sgt);

      return 0;

--
2.13.5

_______________________________________________
dri-devel mailing list
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help