Re: possible deadlock in start_this_handle (2)
From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-12 15:44:36
Also in:
linux-mm, lkml
On Fri 12-02-21 21:58:15, Tetsuo Handa wrote:
On 2021/02/12 21:30, Michal Hocko wrote:quoted
On Fri 12-02-21 12:22:07, Matthew Wilcox wrote:quoted
On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote:quoted
On 2021/02/12 1:41, Michal Hocko wrote:quoted
But I suspect we have drifted away from the original issue. I thought that a simple check would help us narrow down this particular case and somebody messing up from the IRQ context didn't sound like a completely off.From my experience at https://lkml.kernel.org/r/201409192053.IHJ35462.JLOMOSOFFVtQFH@I-love.SAKURA.ne.jp , I think we can replace direct PF_* manipulation with macros which do not receive "struct task_struct *" argument. Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating PFA_* flags on a remote thread, we can define similar ones for manipulating PF_* flags on current thread. Then, auditing dangerous users becomes easier.No, nobody is manipulating another task's GFP flags.Agreed. And nobody should be manipulating PF flags on remote tasks either.No. You are misunderstanding. The bug report above is an example of manipulating PF flags on remote tasks.
The bug report you are referring to is ancient. And the cpuset code doesn't touch task->flags for a long time. I haven't checked exactly but it is years since regular and atomic flags have been separated unless I misremember.
You say "nobody should", but the reality is "there indeed was". There might be unnoticed others. The point of this proposal is to make it possible to "find such unnoticed users who are manipulating PF flags on remote tasks".
I am really confused what you are proposing here TBH and referring to an ancient bug doesn't really help. task->flags are _explicitly_ documented to be only used for _current_. Is it possible that somebody writes a buggy code? Sure, should we build a whole infrastructure around that to catch such a broken code? I am not really sure. One bug 6 years ago doesn't sound like a good reason for that. -- Michal Hocko SUSE Labs