Thread (161 messages) 161 messages, 27 authors, 2021-10-25

Re: [GIT PULL] Memory folios for v5.15

From: Matthew Wilcox <willy@infradead.org>
Date: 2021-08-27 18:46:15
Also in: linux-fsdevel, lkml

On Fri, Aug 27, 2021 at 10:07:16AM -0400, Johannes Weiner wrote:
We have the same thoughts in MM and growing memory sizes. The DAX
stuff said from the start it won't be built on linear struct page
mappings anymore because we expect the memory modules to be too big to
manage them with such fine-grained granularity.
Well, I did.  Then I left Intel, and Dan took over.  Now we have a struct
page for each 4kB of PMEM.  I'm not particularly happy about this change
of direction.
But in practice, this
is more and more becoming true for DRAM as well. We don't want to
allocate gigabytes of struct page when on our servers only a very
small share of overall memory needs to be managed at this granularity.
This is a much less compelling argument than you think.  I had some
ideas along these lines and I took them to a performance analysis group.
They told me that for their workloads, doubling the amount of DRAM in a
system increased performance by ~10%.  So increasing the amount of DRAM
by 1/63 is going to increase performance by 1/630 or 0.15%.  There are
more important performance wins to go after.

Even in the cloud space where increasing memory by 1/63 might increase the
number of VMs you can host by 1/63, how many PMs host as many as 63 VMs?
ie does it really buy you anything?  It sounds like a nice big number
("My 1TB machine has 16GB occupied by memmap!"), but the real benefit
doesn't really seem to be there.  And of course, that assumes that you
have enough other resources to scale to 64/63 of your current workload;
you might hit CPU, IO or some other limit first.
Folio perpetuates the problem of the base page being the floor for
cache granularity, and so from an MM POV it doesn't allow us to scale
up to current memory sizes without horribly regressing certain
filesystem workloads that still need us to be able to scale down.
The mistake you're making is coupling "minimum mapping granularity" with
"minimum allocation granularity".  We can happily build a system which
only allocates memory on 2MB boundaries and yet lets you map that memory
to userspace in 4kB granules.
I really don't think it makes sense to discuss folios as the means for
enabling huge pages in the page cache, without also taking a long hard
look at the allocation model that is supposed to back them. Because
you can't make it happen without that. And this part isn't looking so
hot to me, tbh.
Please, don't creep the scope of this project to "first, redesign
the memory allocator".  This project is _if we can_, use larg(er)
pages to cache files.  What Darrick is talking about is an entirely
different project that I haven't signed up for and won't.
Willy says he has future ideas to make compound pages scale. But we
have years of history saying this is incredibly hard to achieve - and
it certainly wasn't for a lack of constant trying.
I genuinely don't understand.  We have five primary users of memory
in Linux (once we're in a steady state after boot):

 - Anonymous memory
 - File-backed memory
 - Slab
 - Network buffers
 - Page tables

The relative importance of each one very much depends on your workload.
Slab already uses medium order pages and can be made to use larger.
Folios should give us large allocations of file-backed memory and
eventually anonymous memory.  Network buffers seem to be headed towards
larger allocations too.  Page tables will need some more thought, but
once we're no longer interleaving file cache pages, anon pages and
page tables, they become less of a problem to deal with.

Once everybody's allocating order-4 pages, order-4 pages become easy
to allocate.  When everybody's allocating order-0 pages, order-4 pages
require the right 16 pages to come available, and that's really freaking
hard.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help