Thread (55 messages) 55 messages, 6 authors, 2020-01-18

Re: [RFC PATCH V2 01/12] fs/stat: Define DAX statx attribute

From: Dan Williams <hidden>
Date: 2020-01-15 20:11:07
Also in: linux-fsdevel, linux-xfs, lkml

On Wed, Jan 15, 2020 at 11:45 AM Ira Weiny [off-list ref] wrote:
On Wed, Jan 15, 2020 at 09:38:34AM -0800, Darrick J. Wong wrote:
quoted
On Wed, Jan 15, 2020 at 12:37:15PM +0100, Jan Kara wrote:
quoted
On Fri 10-01-20 11:29:31, ira.weiny@intel.com wrote:
quoted
From: Ira Weiny <ira.weiny@intel.com>

In order for users to determine if a file is currently operating in DAX
mode (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

  DAX (cpu direct access) is a file mode that attempts to minimize
"..is a file I/O mode"?
or  "... is a file state ..."?
quoted
quoted
quoted
  software cache effects for both I/O and memory mappings of this
  file.  It requires a capable device, a compatible filesystem
  block size, and filesystem opt-in.
"...a capable storage device..."
Done
quoted
What does "compatible fs block size" mean?  How does the user figure out
if their fs blocksize is compatible?  Do we tell users to refer their
filesystem's documentation here?
Perhaps it is wrong for this to be in the man page at all?  Would it be better
to assume the file system and block device are already configured properly by
the admin?

For which the blocksize restrictions are already well documented.  ie:

https://www.kernel.org/doc/Documentation/filesystems/dax.txt

?

How about changing the text to:

        It requires a block device and file system which have been configured
        to support DAX.

?
The goal was to document the gauntlet of checks that
__generic_fsdax_supported() performs so someone could debug "why am I
not able to get dax operation?"
quoted
quoted
quoted
It generally assumes all
  accesses are via cpu load / store instructions which can
  minimize overhead for small accesses, but adversely affect cpu
  utilization for large transfers.
Will this always be true for persistent memory?
For direct-mapped pmem there is no opportunity to do dma offload so it
will always be true that application dax access consumes cpu to do I/O
where something like NVMe does not. There has been unfruitful to date
experiments with the driver using an offload engine for kernel
internal I/O, but if you're use case is kernel internal I/O bound then
you don't need dax.
I'm not clear.  Did you mean; "this" == adverse utilization for large transfers?
quoted
I wasn't even aware that large transfers adversely affected CPU
utilization. ;)
Sure vs using a DMA engine for example.
Right, this is purely a statement about cpu memcpy vs device-dma.
quoted
quoted
quoted
 File I/O is done directly
  to/from user-space buffers. While the DAX property tends to
  result in data being transferred synchronously it does not give
"...transferred synchronously, it does not..."
done.
quoted
quoted
quoted
  the guarantees of synchronous I/O that data and necessary
"...it does not guarantee that I/O or file metadata have been flushed to
the storage device."
The lack of guarantee here is mainly regarding metadata.

How about:

        While the DAX property tends to result in data being transferred
        synchronously, it does not give the same guarantees of
        synchronous I/O where data and the necessary metadata are
        transferred together.
quoted
quoted
quoted
  metadata are transferred. Memory mapped I/O may be performed
  with direct mappings that bypass system memory buffering.
"...with direct memory mappings that bypass kernel page cache."
Done.
quoted
quoted
quoted
Again
  while memory-mapped I/O tends to result in data being
I would move the sentence about "Memory mapped I/O..." to directly after
the sentence about file I/O being done directly to and from userspace so
that you don't need to repeat this statement.
Done.
quoted
quoted
quoted
  transferred synchronously it does not guarantee synchronous
  metadata updates. A dax file may optionally support being mapped
  with the MAP_SYNC flag which does allow cpu store operations to
  be considered synchronous modulo cpu cache effects.
How does one detect or work around or deal with "cpu cache effects"?  I
assume some sort of CPU cache flush instruction is what is meant here,
but I think we could mention the basics of what has to be done here:

"A DAX file may support being mapped with the MAP_SYNC flag, which
enables a program to use CPU cache flush operations to persist CPU store
operations without an explicit fsync(2).  See mmap(2) for more
information."?
That sounds better.  I like the reference to mmap as well.

Ok I changed a couple of things as well.  How does this sound?


STATX_ATTR_DAX

        DAX (cpu direct access) is a file mode that attempts to minimize
s/mode/state/?
        software cache effects for both I/O and memory mappings of this
        file.  It requires a block device and file system which have
        been configured to support DAX.
It may not require a block device in the future.
        DAX generally assumes all accesses are via cpu load / store
        instructions which can minimize overhead for small accesses, but
        may adversely affect cpu utilization for large transfers.

        File I/O is done directly to/from user-space buffers and memory
        mapped I/O may be performed with direct memory mappings that
        bypass kernel page cache.

        While the DAX property tends to result in data being transferred
        synchronously, it does not give the same guarantees of
        synchronous I/O where data and the necessary metadata are
Maybe use "O_SYNC I/O" explicitly to further differentiate the 2
meanings of "synchronous" in this sentence?
        transferred together.

        A DAX file may support being mapped with the MAP_SYNC flag,
        which enables a program to use CPU cache flush operations to
s/operations/instructions/
        persist CPU store operations without an explicit fsync(2).  See
        mmap(2) for more information.
I think this also wants a reference to the Linux interpretation of
platform "persistence domains" we were discussing that here [1], but
maybe it should be part of a "pmem" manpage that can be referenced
from this man page.

[1]: http://lore.kernel.org/r/20200108064905.170394-1-aneesh.kumar@linux.ibm.com (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help