Re: [PATCH 1/1] psi: do not require setsched permission from the trigger creator
From: Suren Baghdasaryan <surenb@google.com>
Date: 2019-08-01 18:28:48
Also in:
linux-mm, lkml
Hi Peter, Thanks for sharing your thoughts. I understand your point and I tend to agree with it. I originally designed this using watchdog as the example of a critical system health signal and in the context of mobile device memory pressure is critical but I agree that there are more important things in life. I checked and your proposal to change it to FIFO-1 should still work for our purposes. Will test to make sure and reply to your patch. Couple clarifications in-line. On Thu, Aug 1, 2019 at 2:51 AM Peter Zijlstra [off-list ref] wrote:
On Tue, Jul 30, 2019 at 10:44:51AM -0700, Suren Baghdasaryan wrote:quoted
On Tue, Jul 30, 2019 at 1:11 AM Peter Zijlstra [off-list ref] wrote:quoted
On Mon, Jul 29, 2019 at 06:33:10PM -0700, Suren Baghdasaryan wrote:quoted
When a process creates a new trigger by writing into /proc/pressure/* files, permissions to write such a file should be used to determine whether the process is allowed to do so or not. Current implementation would also require such a process to have setsched capability. Setting of psi trigger thread's scheduling policy is an implementation detail and should not be exposed to the user level. Remove the permission check by using _nocheck version of the function. Suggested-by: Nick Kralevich <redacted> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- kernel/sched/psi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 7acc632c3b82..ed9a1d573cb1 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c@@ -1061,7 +1061,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, mutex_unlock(&group->trigger_lock); return ERR_CAST(kworker); } - sched_setscheduler(kworker->task, SCHED_FIFO, ¶m); + sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m);ARGGH, wtf is there a FIFO-99!! thread here at all !?We need psi poll_kworker to be an rt-priority thread so that psiThere is a giant difference between 'needs to be higher than OTHER' and FIFO-99.quoted
notifications are delivered to the userspace without delay even when the CPUs are very congested. Otherwise it's easy to delay psi notifications by running a simple CPU hogger executing "chrt -f 50 dd if=/dev/zero of=/dev/null". Because these notifications areSo what; at that point that's exactly what you're asking for. Using RT is for those who know what they're doing.quoted
time-critical for reacting to memory shortages we can't allow for such delays.Furthermore, actual RT programs will have pre-allocated and locked any memory they rely on. They don't give a crap about your pressure nonsense.
This signal is used not to protect other RT tasks but to monitor overall system memory health for the sake of system responsiveness.
quoted
Notice that this kworker is created only if userspace creates a psi trigger. So unless you are using psi triggers you will never see this kthread created.By marking it FIFO-99 you're in effect saying that your stupid statistics gathering is more important than your life. It will preempt the task that's in control of the band-saw emergency break, it will preempt the task that's adjusting the electromagnetic field containing this plasma flow. That's insane.
IMHO an opt-in feature stops being "stupid" as soon as the user opted in to use it, therefore explicitly indicating interest in it. However I assume you are using "stupid" here to indicate that it's "less important" rather than it's "useless".
I'm going to queue a patch to reduce this to FIFO-1, that will preempt regular OTHER tasks but will not perturb (much) actual RT bits.
Thanks for posting the fix.
-- To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.