Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL
From: Michal Hocko <mhocko@suse.com>
Date: 2021-10-07 10:07:50
Also in:
linux-doc, linux-ext4, linux-fsdevel, linux-nfs, linux-xfs, lkml
On Thu 07-10-21 10:14:52, Dave Chinner wrote:
On Tue, Oct 05, 2021 at 02:27:45PM +0200, Vlastimil Babka wrote:quoted
On 10/5/21 13:09, Michal Hocko wrote:quoted
On Tue 05-10-21 11:20:51, Vlastimil Babka wrote: [...]quoted
quoted
--- a/include/linux/gfp.h +++ b/include/linux/gfp.h@@ -209,7 +209,11 @@ struct vm_area_struct; * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. - * Using this flag for costly allocations is _highly_ discouraged. + * Use of this flag may lead to deadlocks if locks are held which would + * be needed for memory reclaim, write-back, or the timely exit of a + * process killed by the OOM-killer. Dropping any locks not absolutely + * needed is advisable before requesting a %__GFP_NOFAIL allocate. + * Using this flag for costly allocations (order>1) is _highly_ discouraged.We define costly as 3, not 1. But sure it's best to avoid even order>0 for __GFP_NOFAIL. Advising order>1 seems arbitrary though?This is not completely arbitrary. We have a warning for any higher order allocation. rmqueue: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));Oh, I missed that.quoted
I do agree that "Using this flag for higher order allocations is _highly_ discouraged.Well, with the warning in place this is effectively forbidden, not just discouraged.Yup, especially as it doesn't obey __GFP_NOWARN. See commit de2860f46362 ("mm: Add kvrealloc()") as a direct result of unwittingly tripping over this warning when adding __GFP_NOFAIL annotations to replace open coded high-order kmalloc loops that have been in place for a couple of decades without issues. Personally I think that the way __GFP_NOFAIL is first of all recommended over open coded loops and then only later found to be effectively forbidden and needing to be replaced with open coded loops to be a complete mess.
Well, there are two things. Opencoding something that _can_ be replaced by __GFP_NOFAIL and those that cannot because the respective allocator doesn't really support that semantic. kvmalloc is explicit about that IIRC. If you have a better way to consolidate the documentation then I am all for it.
Not to mention on the impossibility of using __GFP_NOFAIL with kvmalloc() calls. Just what do we expect kmalloc_node(__GFP_NORETRY | __GFP_NOFAIL) to do, exactly?
This combination doesn't make any sense. Like others. Do you want us to list all combinations that make sense?
So, effectively, we have to open-code around kvmalloc() in situations where failure is not an option. Even if we pass __GFP_NOFAIL to __vmalloc(), it isn't guaranteed to succeed because of the "we won't honor gfp flags passed to __vmalloc" semantics it has.
yes vmalloc doesn't support nofail semantic and it is not really trivial to craft it there.
Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS kvmalloc via memalloc_nofs_save/restore(), so this behavioural restriction w.r.t. gfp flags just makes no sense at all.
GFP_NOFS (without using the scope API) has the same problem as NOFAIL in the vmalloc. Hence it is not supported. If you use the scope API then you can GFP_KERNEL for kvmalloc. This is clumsy but I am not sure how to define these conditions in a more sensible way. Special case NOFS if the scope api is in use? Why do you want an explicit NOFS then?
That leads to us having to go back to writing extremely custom open coded loops to avoid awful high-order kmalloc direct reclaim behaviour and still fall back to vmalloc and to still handle NOFAIL semantics we need: https://lore.kernel.org/linux-xfs/20210902095927.911100-8-david@fromorbit.com/ (local)
It would be more productive to get to MM people rather than rant on a xfs specific patchse. Anyway, I can see a kvmalloc mode where the kmalloc allocation would be really a very optimistic one - like your effectively GFP_NOWAIT. Nobody has requested such a mode until now and I am not sure how we would sensibly describe that by a gfp mask. Btw. your GFP_NOWAIT | __GFP_NORETRY combination doesn't make any sense in the allocator context as the later is a reclaim mofifier which doesn't get applied when the reclaim is disabled (in your case by flags &= ~__GFP_DIRECT_RECLAIM). GFP flags are not that easy to build a coherent and usable apis. Something we carry as a baggage for a long time.
So, really, the problems are much deeper here than just badly documented, catch-22 rules for __GFP_NOFAIL - we can't even use __GFP_NOFAIL consistently across the allocation APIs because it changes allocation behaviours in unusable, self-defeating ways....
GFP_NOFAIL sucks. Not all allocator can follow it for practical reasons. You are welcome to help document those awkward corner cases or fix them up if you have a good idea how. Thanks! -- Michal Hocko SUSE Labs