Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

From: Imre Deak <hidden>
Date: 2017-02-21 14:18:58
Also in: intel-gfx

On Tue, Feb 21, 2017 at 01:19:37PM +0000, Chris Wilson wrote:

On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:

quoted

On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:

quoted

On 21/02/2017 09:37, Chris Wilson wrote:

quoted

On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:

quoted

On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:

quoted

So that our preempt-off period doesn't grow completely unchecked, or do
we need that 34ms loop?

Yes, that's at least how I understand it. Scheduling away is what let's
PCODE start servicing some other request than ours or go idle. That's
in a way what we see when the preempt-enabled poll times out.

I was thinking along the lines of if it was just busy/unavailable for the
first 33ms that particular time, it just needed to sleep until ready.
Once available, the next request ran in the expected 1ms.

quoted

Do you not see any value in trying a sleeping loop? Perhaps compromise
and have the preempt-disable timeout increase each iteration.

This fallback method would work too, but imo the worst case is what
matters and that would be anyway the same in both cases. Because of this
and since it's a WA I'd rather keep it simple.

quoted

Parachuting in so apologies if I misunderstood something.

Is the issue here that we can get starved out of CPU time for more than 33ms
while waiting for an event?

We need to actively resend the same request for this duration.

quoted

Could we play games with sched_setscheduler and maybe temporarily go
SCHED_DEADLINE or something? Would have to look into how to correctly
restore to the old state from that and from which contexts we can actually
end up in this wait.

What would be the benefit wrt. disabling preemption? Note that since
it's a workaround it would be good to keep it simple and close to how it
worked on previous platforms (SKL/APL).

Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
interaction at all. Or that we don't handle the failure gracefully. Or
that the hw appears pretty flimsy and the communitcation method is hit
and miss.

Yes, me neither. It's clearly not by design, since based on the
specification two requests 3ms apart would need to be enough.

I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
an order of magnitude beyond the expected response time.

50 I would ack. :|

Ok, I can resend with that if Tvrtko agrees.

--Imre

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help