Thread (14 messages) 14 messages, 4 authors, 2018-10-04

Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps

From: Jan Kara <jack@suse.cz>
Date: 2018-10-03 15:06:58
Also in: linux-api, linux-fsdevel, linux-mm, linux-xfs, nvdimm

On Wed 03-10-18 07:38:50, Dan Williams wrote:
On Wed, Oct 3, 2018 at 5:51 AM Jan Kara [off-list ref] wrote:
quoted
On Tue 02-10-18 13:18:54, Dan Williams wrote:
quoted
On Tue, Oct 2, 2018 at 8:32 AM Jan Kara [off-list ref] wrote:
quoted
On Tue 02-10-18 07:52:06, Christoph Hellwig wrote:
quoted
On Tue, Oct 02, 2018 at 04:44:13PM +0200, Johannes Thumshirn wrote:
quoted
On Tue, Oct 02, 2018 at 07:37:13AM -0700, Christoph Hellwig wrote:
quoted
No, it should not.  DAX is an implementation detail thay may change
or go away at any time.
Well we had an issue with an application checking for dax, this is how
we landed here in the first place.
So what exacty is that "DAX" they are querying about (and no, I'm not
joking, nor being philosophical).
I believe the application we are speaking about is mostly concerned about
the memory overhead of the page cache. Think of a machine that has ~ 1TB of
DRAM, the database running on it is about that size as well and they want
database state stored somewhere persistently - which they may want to do by
modifying mmaped database files if they do small updates... So they really
want to be able to use close to all DRAM for the DB and not leave slack
space for the kernel page cache to cache 1TB of database files.
VM_MIXEDMAP was never a reliable indication of DAX because it could be
set for random other device-drivers that use vm_insert_mixed(). The
MAP_SYNC flag positively indicates that page cache is disabled for a
given mapping, although whether that property is due to "dax" or some
other kernel mechanics is purely an internal detail.

I'm not opposed to faking out VM_MIXEDMAP if this broken check has
made it into production, but again, it's unreliable.
So luckily this particular application wasn't widely deployed yet so we
will likely get away with the vendor asking customers to update to a
version not looking into smaps and parsing /proc/mounts instead.

But I don't find parsing /proc/mounts that beautiful either and I'd prefer
if we had a better interface for applications to query whether they can
avoid page cache for mmaps or not.
Yeah, the mount flag is not a good indicator either. I think we need
to follow through on the per-inode property of DAX. Darrick and I
discussed just allowing the property to be inherited from the parent
directory at file creation time. That avoids the dynamic set-up /
teardown races that seem intractable at this point.

What's wrong with MAP_SYNC as a page-cache detector in the meantime?
So IMHO checking for MAP_SYNC is about as reliable as checking for 'dax'
mount option. It works now but nobody promises it will reliably detect DAX in
future - e.g. there's nothing that prevents MAP_SYNC to work for mappings
using pagecache if we find a sensible usecase for that.

WRT per-inode DAX property, AFAIU that inode flag is just going to be
advisory thing - i.e., use DAX if possible. If you mount a filesystem with
these inode flags set in a configuration which does not allow DAX to be
used, you will still be able to access such inodes but the access will use
page cache instead. And querying these flags should better show real
on-disk status and not just whether DAX is used as that would result in an
even bigger mess. So this feature seems to be somewhat orthogonal to the
API I'm looking for.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help