Re: [RFC PATCH 1/2] mm, mincore2(): retrieve dax and tlb-size attributes of an address range
From: Dan Williams <hidden>
Date: 2016-09-12 17:25:29
Also in:
linux-mm, lkml, nvdimm
On Sun, Sep 11, 2016 at 11:29 PM, Oliver O'Halloran [off-list ref] wrote:
On Mon, Sep 12, 2016 at 3:31 AM, Dan Williams [off-list ref] wrote:quoted
As evidenced by this bug report [1], userspace libraries are interested in whether a mapping is DAX mapped, i.e. no intervening page cache. Rather than using the ambiguous VM_MIXEDMAP flag in smaps, provide an explicit "is dax" indication as a new flag in the page vector populated by mincore. There are also cases, particularly for testing and validating a configuration to know the hardware mapping geometry of the pages in a given process address range. Consider filesystem-dax where a configuration needs to take care to align partitions and block allocations before huge page mappings might be used, or anonymous-transparent-huge-pages where a process is opportunistically assigned large pages. mincore2() allows these configurations to be surveyed and validated. The implementation takes advantage of the unused bits in the per-page byte returned for each PAGE_SIZE extent of a given address range. The new format of each vector byte is: (TLB_SHIFT - PAGE_SHIFT) << 2 | vma_is_dax() << 1 | page_presentWhat is userspace expected to do with the information in vec? Whether PMD or THP mappings can be used is going to depend more on the block allocations done by the filesystem rather than anything the an application can directly influence. Returning a vector for each page makes some sense in the mincore() case since the application can touch each page to fault them in, but I don't see what they can do here.
It's not a "can huge pages be used?" question it's interrogating the mapping that got established after the fact. If an application/environment expects huge mappings, but pte mappings are getting established
Why not just get rid of vec entirely and make mincore2() a yes/no check over the range for whatever is supplied in flags? That would work for NVML's use case and it should be easier to extend if needed.
I think having a way to ask the kernel if an address range satisfies a certain set of input attributes is a useful interface. Perhaps a "MINCORE_CHECK" flag can indicate that the input vector contains a single character that it wants the kernel to validate during the page table walk, and return zero or the offset of the first mismatch.