Re: [PATCH v15 04/13] task_isolation: add initial support
From: Chris Metcalf <hidden>
Date: 2016-09-30 19:32:59
Also in:
linux-api, lkml
On 8/30/2016 2:43 PM, Andy Lutomirski wrote:
On Aug 30, 2016 10:02 AM, "Chris Metcalf" [off-list ref] wrote:quoted
We really want to run task isolation last, so we can guarantee that all the isolation prerequisites are met (dynticks stopped, per-cpu lru cache empty, etc). But achieving that state can require enabling interrupts - most obviously if we have to schedule, e.g. for vmstat clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or just while waiting for that last dyntick interrupt to occur. I'm also not sure that even something as simple as draining the per-cpu lru cache can be done holding interrupts disabled throughout - certainly there's a !SMP code path there that just re-enables interrupts unconditionally, which gives me pause. At any rate at that point you need to retest for signals, resched, etc, all as usual, and then you need to recheck the task isolation prerequisites once more. I may be missing something here, but it's really not obvious to me that there's a way to do this without having task isolation integrated into the usual return-to-userspace loop.What if we did it the other way around: set a percpu flag saying "going quiescent; disallow new deferred work", then finish all existing work and return to userspace. Then, on the next entry, clear that flag. With the flag set, vmstat would just flush anything that it accumulates immediately, nothing would be added to the LRU list, etc.
Thinking about this some more, I was struck by an even simpler way to approach this. What if we just said that on task isolation cores, no kernel subsystem should do something that would require a future interruption? So vmstat would just always sync immediately on task isolation cores, the mm subsystem wouldn't use per-cpu LRU stuff on task isolation cores, etc. That way we don't have to worry about the status of those things as we are returning to userspace for a task isolation process, since it's just always kept "pristine". The task-isolation setting per-core is not user-customizable, and the task-stealing scheduler doesn't even run there, so it's not like any processes will land there and be in a position to complain about the performance overhead of having no deferred work being created... -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>