Thread (12 messages) 12 messages, 6 authors, 2016-09-13

Re: [RFC PATCH 1/2] mm, mincore2(): retrieve dax and tlb-size attributes of an address range

From: Dan Williams <hidden>
Date: 2016-09-12 17:25:29
Also in: linux-mm, lkml, nvdimm

On Sun, Sep 11, 2016 at 11:29 PM, Oliver O'Halloran [off-list ref] wrote:
On Mon, Sep 12, 2016 at 3:31 AM, Dan Williams [off-list ref] wrote:
quoted
As evidenced by this bug report [1], userspace libraries are interested
in whether a mapping is DAX mapped, i.e. no intervening page cache.
Rather than using the ambiguous VM_MIXEDMAP flag in smaps, provide an
explicit "is dax" indication as a new flag in the page vector populated
by mincore.

There are also cases, particularly for testing and validating a
configuration to know the hardware mapping geometry of the pages in a
given process address range.  Consider filesystem-dax where a
configuration needs to take care to align partitions and block
allocations before huge page mappings might be used, or
anonymous-transparent-huge-pages where a process is opportunistically
assigned large pages.  mincore2() allows these configurations to be
surveyed and validated.

The implementation takes advantage of the unused bits in the per-page
byte returned for each PAGE_SIZE extent of a given address range.  The
new format of each vector byte is:

(TLB_SHIFT - PAGE_SHIFT) << 2 | vma_is_dax() << 1 | page_present
What is userspace expected to do with the information in vec? Whether
PMD or THP mappings can be used is going to depend more on the block
allocations done by the filesystem rather than anything the an
application can directly influence. Returning a vector for each page
makes some sense in the mincore() case since the application can touch
each page to fault them in, but I don't see what they can do here.
It's not a "can huge pages be used?" question it's interrogating the
mapping that got established after the fact.  If an
application/environment expects huge mappings, but pte mappings are
getting established
Why not just get rid of vec entirely and make mincore2() a yes/no
check over the range for whatever is supplied in flags? That would
work for NVML's use case and it should be easier to extend if needed.
I think having a way to ask the kernel if an address range satisfies a
certain set of input attributes is a useful interface.  Perhaps a
"MINCORE_CHECK" flag can indicate that the input vector contains a
single character that it wants the kernel to validate during the page
table walk, and return zero or the offset of the first mismatch.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help