Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
From: Tvrtko Ursulin <hidden>
Date: 2017-02-21 14:16:56
Also in:
intel-gfx
On 21/02/2017 14:13, Imre Deak wrote:
On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:quoted
On 21/02/2017 12:43, Imre Deak wrote:quoted
On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:quoted
On 21/02/2017 09:37, Chris Wilson wrote:quoted
On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:quoted
On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:quoted
So that our preempt-off period doesn't grow completely unchecked, or do we need that 34ms loop?Yes, that's at least how I understand it. Scheduling away is what let's PCODE start servicing some other request than ours or go idle. That's in a way what we see when the preempt-enabled poll times out.I was thinking along the lines of if it was just busy/unavailable for the first 33ms that particular time, it just needed to sleep until ready. Once available, the next request ran in the expected 1ms.quoted
Do you not see any value in trying a sleeping loop? Perhaps compromise and have the preempt-disable timeout increase each iteration.This fallback method would work too, but imo the worst case is what matters and that would be anyway the same in both cases. Because of this and since it's a WA I'd rather keep it simple.quoted
Parachuting in so apologies if I misunderstood something. Is the issue here that we can get starved out of CPU time for more than 33ms while waiting for an event?We need to actively resend the same request for this duration.quoted
Could we play games with sched_setscheduler and maybe temporarily go SCHED_DEADLINE or something? Would have to look into how to correctly restore to the old state from that and from which contexts we can actually end up in this wait.What would be the benefit wrt. disabling preemption? Note that since it's a workaround it would be good to keep it simple and close to how it worked on previous platforms (SKL/APL).It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if the main problem is the scheduler/CPU starvation, to see if it can be solved differently. Even though the atomic wait here would trigger very rarely it might be worth coming up with something nicer and generalized. If I understood it correctly, the difference between this wait_for call site and the rest is that here it wants a certain number of COND checks to be guaranteed?Yes.quoted
The other call sites care more about checking on enter and exit. So in this case we want the period parameter to actually be guaranteed (or close). This sounded like a good candidate for SCHED_DEADLINE to me. Like wait_for_periodic(COND, TIMEOUT, INTERVAL).Could be. But this would give less guarantee than disabling preemption, as SCHED_DEADLINE still works on a best effort basis. How about increasing the timeout now (to 50ms) and trying what you suggest as a follow-up? That way we have also something for -stable.
If 50ms works it is fine by me. Regards, Tvrtko