Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

From: Dan Williams <hidden>
Date: 2019-03-21 03:12:49
Also in: linux-mm, lkml, nvdimm

On Wed, Mar 20, 2019 at 8:09 PM Oliver [off-list ref] wrote:

On Thu, Mar 21, 2019 at 7:57 AM Dan Williams [off-list ref] wrote:

quoted

On Wed, Mar 20, 2019 at 8:34 AM Dan Williams [off-list ref] wrote:

quoted

On Wed, Mar 20, 2019 at 1:09 AM Aneesh Kumar K.V
[off-list ref] wrote:

quoted

Aneesh Kumar K.V [off-list ref] writes:

quoted

Dan Williams [off-list ref] writes:

quoted

Now what will be page size used for mapping vmemmap?

That's up to the architecture's vmemmap_populate() implementation.

quoted

Architectures
possibly will use PMD_SIZE mapping if supported for vmemmap. Now a
device-dax with struct page in the device will have pfn reserve area aligned
to PAGE_SIZE with the above example? We can't map that using
PMD_SIZE page size?

IIUC, that's a different alignment. Currently that's handled by
padding the reservation area up to a section (128MB on x86) boundary,
but I'm working on patches to allow sub-section sized ranges to be
mapped.

I am missing something w.r.t code. The below code align that using nd_pfn->align

      if (nd_pfn->mode == PFN_MODE_PMEM) {
              unsigned long memmap_size;

              /*
               * vmemmap_populate_hugepages() allocates the memmap array in
               * HPAGE_SIZE chunks.
               */
              memmap_size = ALIGN(64 * npfns, HPAGE_SIZE);
              offset = ALIGN(start + SZ_8K + memmap_size + dax_label_reserve,
                              nd_pfn->align) - start;
      }

IIUC that is finding the offset where to put vmemmap start. And that has
to be aligned to the page size with which we may end up mapping vmemmap
area right?

Right, that's the physical offset of where the vmemmap ends, and the
memory to be mapped begins.

quoted

Yes we find the npfns by aligning up using PAGES_PER_SECTION. But that
is to compute howmany pfns we should map for this pfn dev right?

Also i guess those 4K assumptions there is wrong?

Yes, I think to support non-4K-PAGE_SIZE systems the 'pfn' metadata
needs to be revved and the PAGE_SIZE needs to be recorded in the
info-block.

How often does a system change page-size. Is it fixed or do
environment change it from one boot to the next? I'm thinking through
the behavior of what do when the recorded PAGE_SIZE in the info-block
does not match the current system page size. The simplest option is to
just fail the device and require it to be reconfigured. Is that
acceptable?

The kernel page size is set at build time and as far as I know every
distro configures their ppc64(le) kernel for 64K. I've used 4K kernels
a few times in the past to debug PAGE_SIZE dependent problems, but I'd
be surprised if anyone is using 4K in production.

Ah, ok.

Anyway, my view is that using 4K here isn't really a problem since
it's just the accounting unit of the pfn superblock format. The kernel
reading form it should understand that and scale it to whatever
accounting unit it wants to use internally. Currently we don't so that
should probably be fixed, but that doesn't seem to cause any real
issues. As far as I can tell the only user of npfns in
__nvdimm_setup_pfn() whih prints the "number of pfns truncated"
message.

Am I missing something?

No, I don't think so. The only time it would break is if a system with
64K page size laid down an info-block with not enough reserved
capacity when the page-size is 4K (npfns too small). However, that
sounds like an exceptional case which is why no problems have been
reported to date.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help