Thread (46 messages) 46 messages, 7 authors, 2017-08-21

Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap

From: Jan Kara <jack@suse.cz>
Date: 2017-08-14 12:48:03
Also in: linux-api, linux-fsdevel, lkml, nvdimm

On Sun 13-08-17 13:31:45, Dan Williams wrote:
On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig [off-list ref] wrote:
quoted
Thay being said I think we absolutely should support RDMA memory
registrations for DAX mappings.  I'm just not sure how S_IOMAP_IMMUTABLE
helps with that.  We'll want a MAP_SYNC | MAP_POPULATE to make sure
all the blocks are polulated and all ptes are set up.  Second we need
to make sure get_user_page works, which for now means we'll need a
struct page mapping for the region (which will be really annoying
for PCIe mappings, like the upcoming NVMe persistent memory region),
and we need to gurantee that the extent mapping won't change while
the get_user_pages holds the pages inside it.  I think that is true
due to side effects even with the current DAX code, but we'll need to
make it explicit.  And maybe that's where we need to converge -
"sealing" the extent map makes sense as such a temporary measure
that is not persisted on disk, which automatically gets released
when the holding process exits, because we sort of already do this
implicitly.  It might also make sense to have explicitl breakable
seals similar to what I do for the pNFS blocks kernel server, as
any userspace RDMA file server would also need those semantics.
Ok, how about a MAP_DIRECT flag that arranges for faults to that range to:

    1/ only succeed if the fault can be satisfied without page cache

    2/ only install a pte for the fault if it can do so without
triggering block map updates

So, I think it would still end up setting an inode flag to make
xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping
active. However, it would not record that state in the on-disk
metadata and it would automatically clear at munmap time. That should
be enough to support the host-persistent-memory, and
NVMe-persistent-memory use cases (provided we have struct page for
NVMe). Although, we need more safety infrastructure in the NVMe case
where we would need to software manage I/O coherence.
Hum, this proposal (and the problems you are trying to deal with) seem very
similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to
the DAX area (and so additionally complicated by the fact that filesystems
now have to care). The patch set was not merged due to lack of interest I
think but it looked sensible and the proposed API would make sense for more
stuff than just DAX so maybe it would be better than MAP_DIRECT flag?

[1] https://lwn.net/Articles/600502/

								Honza

-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help