Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default
From: Dan Williams <hidden>
Date: 2019-03-21 03:12:49
Also in:
linux-mm, lkml, nvdimm
On Wed, Mar 20, 2019 at 8:09 PM Oliver [off-list ref] wrote:
On Thu, Mar 21, 2019 at 7:57 AM Dan Williams [off-list ref] wrote:quoted
On Wed, Mar 20, 2019 at 8:34 AM Dan Williams [off-list ref] wrote:quoted
On Wed, Mar 20, 2019 at 1:09 AM Aneesh Kumar K.V [off-list ref] wrote:quoted
Aneesh Kumar K.V [off-list ref] writes:quoted
Dan Williams [off-list ref] writes:quoted
quoted
Now what will be page size used for mapping vmemmap?That's up to the architecture's vmemmap_populate() implementation.quoted
Architectures possibly will use PMD_SIZE mapping if supported for vmemmap. Now a device-dax with struct page in the device will have pfn reserve area aligned to PAGE_SIZE with the above example? We can't map that using PMD_SIZE page size?IIUC, that's a different alignment. Currently that's handled by padding the reservation area up to a section (128MB on x86) boundary, but I'm working on patches to allow sub-section sized ranges to be mapped.I am missing something w.r.t code. The below code align that using nd_pfn->align if (nd_pfn->mode == PFN_MODE_PMEM) { unsigned long memmap_size; /* * vmemmap_populate_hugepages() allocates the memmap array in * HPAGE_SIZE chunks. */ memmap_size = ALIGN(64 * npfns, HPAGE_SIZE); offset = ALIGN(start + SZ_8K + memmap_size + dax_label_reserve, nd_pfn->align) - start; } IIUC that is finding the offset where to put vmemmap start. And that has to be aligned to the page size with which we may end up mapping vmemmap area right?Right, that's the physical offset of where the vmemmap ends, and the memory to be mapped begins.quoted
quoted
Yes we find the npfns by aligning up using PAGES_PER_SECTION. But that is to compute howmany pfns we should map for this pfn dev right?Also i guess those 4K assumptions there is wrong?Yes, I think to support non-4K-PAGE_SIZE systems the 'pfn' metadata needs to be revved and the PAGE_SIZE needs to be recorded in the info-block.How often does a system change page-size. Is it fixed or do environment change it from one boot to the next? I'm thinking through the behavior of what do when the recorded PAGE_SIZE in the info-block does not match the current system page size. The simplest option is to just fail the device and require it to be reconfigured. Is that acceptable?The kernel page size is set at build time and as far as I know every distro configures their ppc64(le) kernel for 64K. I've used 4K kernels a few times in the past to debug PAGE_SIZE dependent problems, but I'd be surprised if anyone is using 4K in production.
Ah, ok.
Anyway, my view is that using 4K here isn't really a problem since it's just the accounting unit of the pfn superblock format. The kernel reading form it should understand that and scale it to whatever accounting unit it wants to use internally. Currently we don't so that should probably be fixed, but that doesn't seem to cause any real issues. As far as I can tell the only user of npfns in __nvdimm_setup_pfn() whih prints the "number of pfns truncated" message. Am I missing something?
No, I don't think so. The only time it would break is if a system with 64K page size laid down an info-block with not enough reserved capacity when the page-size is 4K (npfns too small). However, that sounds like an exceptional case which is why no problems have been reported to date.