Thread (208 messages) 208 messages, 21 authors, 2024-02-29

Re: [PATCH v3 31/35] lib: add memory allocations report in show_mem()

From: Vlastimil Babka <hidden>
Date: 2024-02-20 18:27:29
Also in: cgroups, linux-arch, linux-doc, linux-fsdevel, linux-iommu, linux-mm, lkml

On 2/19/24 18:17, Suren Baghdasaryan wrote:
On Thu, Feb 15, 2024 at 3:56 PM Kent Overstreet
[off-list ref] wrote:
quoted
On Thu, Feb 15, 2024 at 06:27:29PM -0500, Steven Rostedt wrote:
quoted
All this, and we are still worried about 4k for useful debugging :-/
I was planning to refactor this function to print one record at a time
with a smaller buffer but after discussing with Kent, he has plans to
reuse this function and having the report in one buffer is needed for
that.
We are printing to console, AFAICS all the code involved uses plain printk()
I think it would be way easier to have a function using printk() for this
use case than the seq_buf which is more suitable for /proc and friends. Then
all concerns about buffers would be gone. It wouldn't be that much of a code
duplication?
quoted
Every additional 4k still needs justification. And whether we burn a
reserve on this will have no observable effect on user output in
remotely normal situations; if this allocation ever fails, we've already
been in an OOM situation for awhile and we've already printed out this
report many times, with less memory pressure where the allocation would
have succeeded.
I'm not sure this claim will always be true, specifically in the case
of low-end devices with relatively low amounts of reserves and in the
That's right, GFP_ATOMIC failures can easily happen without prior OOMs.
Consider a system where userspace allocations fill the memory as they
usually do, up to high watermark. Then a burst of packets is received and
handled by GFP_ATOMIC allocations that deplete the reserves and can't cause
OOMs (OOM is when we fail to reclaim anything, but we are allocating from a
context that can't reclaim), so the very first report would be an GFP_ATOMIC
failure and now it can't allocate that buffer for printing.

I'm sure more such scenarios exist, Cc: Tetsuo who I recall was an expert on
this topic.
presence of a possible quick memory usage spike. We should also
consider a case when panic_on_oom is set. All we get is one OOM
report, so we get only one chance to capture this report. In any case,
I don't yet have data to prove or disprove this claim but it will be
interesting to test it with data from the field once the feature is
deployed.

For now I think with Vlastimil's __GFP_NOWARN suggestion the code
becomes safe and the only risk is to lose this report. If we get cases
with reports missing this data, we can easily change to reserved
memory.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help