Thread (20 messages) 20 messages, 6 authors, 2023-03-27

Re: [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

From: Jonas Ådahl <hidden>
Date: 2023-03-16 09:27:05
Also in: dri-devel, intel-gfx, linux-media, lkml

On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl [off-list ref] wrote:
quoted
On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
quoted
On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl [off-list ref] wrote:
quoted
On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
quoted
From: Rob Clark <redacted>

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
    deadlines, to avoid increasing dma_fence size.  The fence-context
    implementation will need similar logic to track deadlines of all
    the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback
v4: Clarify in docs comment that this is a hint
v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v6: More docs
v7: Fix typo, clarify past deadlines

Signed-off-by: Rob Clark <redacted>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Pekka Paalanen <redacted>
Reviewed-by: Bagas Sanjaya <redacted>
---
Hi Rob!
quoted
 Documentation/driver-api/dma-buf.rst |  6 +++
 drivers/dma-buf/dma-fence.c          | 59 ++++++++++++++++++++++++++++
 include/linux/dma-fence.h            | 22 +++++++++++
 3 files changed, 87 insertions(+)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 622b8156d212..183e480d8cea 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
 .. kernel-doc:: drivers/dma-buf/dma-fence.c
    :doc: fence signalling annotation

+DMA Fence Deadline Hints
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/dma-buf/dma-fence.c
+   :doc: deadline hints
+
 DMA Fences Functions Reference
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..f177c56269bb 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
 }
 EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
+ * DOC: deadline hints
+ *
+ * In an ideal world, it would be possible to pipeline a workload sufficiently
+ * that a utilization based device frequency governor could arrive at a minimum
+ * frequency that meets the requirements of the use-case, in order to minimize
+ * power consumption.  But in the real world there are many workloads which
+ * defy this ideal.  For example, but not limited to:
+ *
+ * * Workloads that ping-pong between device and CPU, with alternating periods
+ *   of CPU waiting for device, and device waiting on CPU.  This can result in
+ *   devfreq and cpufreq seeing idle time in their respective domains and in
+ *   result reduce frequency.
+ *
+ * * Workloads that interact with a periodic time based deadline, such as double
+ *   buffered GPU rendering vs vblank sync'd page flipping.  In this scenario,
+ *   missing a vblank deadline results in an *increase* in idle time on the GPU
+ *   (since it has to wait an additional vblank period), sending a signal to
+ *   the GPU's devfreq to reduce frequency, when in fact the opposite is what is
+ *   needed.
This is the use case I'd like to get some better understanding about how
this series intends to work, as the problematic scheduling behavior
triggered by missed deadlines has plagued compositing display servers
for a long time.

I apologize, I'm not a GPU driver developer, nor an OpenGL driver
developer, so I will need some hand holding when it comes to
understanding exactly what piece of software is responsible for
communicating what piece of information.
quoted
+ *
+ * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline.
+ * The deadline hint provides a way for the waiting driver, or userspace, to
+ * convey an appropriate sense of urgency to the signaling driver.
+ *
+ * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
+ * facing APIs).  The time could either be some point in the future (such as
+ * the vblank based deadline for page-flipping, or the start of a compositor's
+ * composition cycle), or the current time to indicate an immediate deadline
+ * hint (Ie. forward progress cannot be made until this fence is signaled).
Is it guaranteed that a GPU driver will use the actual start of the
vblank as the effective deadline? I have some memories of seing
something about vblank evasion browsing driver code, which I might have
misunderstood, but I have yet to find whether this is something
userspace can actually expect to be something it can rely on.
I guess you mean s/GPU driver/display driver/ ?  It makes things more
clear if we talk about them separately even if they happen to be the
same device.
Sure, sorry about being unclear about that.
quoted
Assuming that is what you mean, nothing strongly defines what the
deadline is.  In practice there is probably some buffering in the
display controller.  For ex, block based (including bandwidth
compressed) formats, you need to buffer up a row of blocks to
efficiently linearize for scanout.  So you probably need to latch some
time before you start sending pixel data to the display.  But details
like this are heavily implementation dependent.  I think the most
reasonable thing to target is start of vblank.
The driver exposing those details would be quite useful for userspace
though, so that it can delay committing updates to late, but not too
late. Setting a deadline to be the vblank seems easy enough, but it
isn't enough for scheduling the actual commit.
I'm not entirely sure how that would even work.. but OTOH I think you
are talking about something on the order of 100us?  But that is a bit
of another topic.
Yes, something like that. But yea, it's not really related. Scheduling
commits closer to the deadline has more complex behavior than that too,
e.g. the need for real time scheduling, and knowing how long it usually
takes to create and commit and for the kernel to process.
8-< *snip* 8-<
quoted
quoted
You need a fence to set the deadline, and for that work needs to be
flushed.  But you can't associate a deadline with work that the kernel
is unaware of anyways.
That makes sense, but it might also a bit inadequate to have it as the
only way to tell the kernel it should speed things up. Even with the
trick i915 does, with GNOME Shell, we still end up with the feedback
loop this series aims to mitigate. Doing triple buffering, i.e. delaying
or dropping the first frame is so far the best work around that works,
except doing other tricks that makes the kernel to ramp up its clock.
Having to rely on choosing between latency and frame drops should
ideally not have to be made.
Before you have a fence, the thing you want to be speeding up is the
CPU, not the GPU.  There are existing mechanisms for that.
Is there no benefit to let the GPU know earlier that it should speed up,
so that when the job queue arrives, it's already up to speed?
TBF I'm of the belief that there is still a need for input based cpu
boost (and early wake-up trigger for GPU).. we have something like
this in CrOS kernel.  That is a bit of a different topic, but my point
is that fence deadlines are just one of several things we need to
optimize power/perf and responsiveness, rather than the single thing
that solves every problem under the sun ;-)
Perhaps; but I believe it's a bit of a back channel of intent; the piece
of the puzzle that has the information to know whether there is need
actually speed up is the compositor, not the kernel.

For example, pressing 'p' while a terminal is focused does not need high
frequency clocks, it just needs the terminal emulator to draw a 'p' and
the compositor to composite that update. Pressing <Super> may however 
trigger a non-trivial animation moving a lot of stuff around on screen,
maybe triggering Wayland clients to draw and what not, and should most
arguably have the ability to "warn" the kernel about the upcoming flood
of work before it is already knocking on its door step.
8-< *snip* 8-<
quoted
Is it expected that WSI's will set their own deadlines, or should that
be the job of the compositor? For example by using compositors using
DMA_BUF_IOCTL_EXPORT_SYNC_FILE that you mentioned, using it to set a
deadline matching the vsync it most ideally will be committed to?
I'm kind of assuming compositors, but if the WSI somehow has more
information about ideal presentation time, then I suppose it could be
in the WSI?  I'll defer to folks who spend more time on WSI and
compositors to hash out the details ;-)
With my compositor developer hat on, it might be best to let it be up to
the compositor, it's the one that knows if a client's content will
actually end up anywhere visible.


Jonas
BR,
-R
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help