Thread (7 messages) 7 messages, 6 authors, 2016-03-03

Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag

From: Boaz Harrosh <hidden>
Date: 2016-02-28 10:17:21
Also in: linux-nfs

On 02/26/2016 12:04 PM, Thanumalayan Sankaranarayana Pillai wrote:
On Thu, Feb 25, 2016 at 10:02 PM, Dan Williams [off-list ref] wrote:
quoted
[ adding Thanu ]
quoted
Very few applications actually care about atomic sector writes.
Databases are probably the only class of application that really do
care about both single sector and multi-sector atomic write
behaviour, and many of them can be configured to assume single
sector writes can be torn.

Torn user data writes have always been possible, and so pmem does
not introduce any new semantics that applications have to handle.
I know about BTT and DAX only at a conceptual level and hence do not understand
this mailing thread fully. But I can provide examples of important applications
expecting atomicity at a 512B or a smaller granularity. Here is a list:

(1) LMDB [1] that Dan mentioned, which expects "linear writes" (i.e., don't
need atomicity, but need the first byte to be written before the second byte)

(2) PostgreSQL expects atomicity [2]

(3) SQLite depends on linear writes [3] (we were unable to find these
dependencies during our testing, however). Also, PSOW in SQLite is not relevant
to this discussion as I understand it; PSOW deals with corruption of data
*around* the actual written bytes.

(4) We found that ZooKeeper depends on atomicity during our testing, but we did
not contact the ZooKeeper developers about this. Some details in our paper [4].

It is tempting to assume that applications do not use the concept of disk
sectors and deal with only file-system blocks (which are not atomic in
practice), and take measures to deal with the non-atomic file-system blocks.
But, in reality, applications seem to assume that 512B (more or less) sectors
are atomic or linear, and build their consistency mechanisms around that.
This all discussion is a shock to me. where were these guys hiding, under a rock?

In the NFS world you can get not torn sectors but torn words. You may have
reorder of writes, you may have data holes the all deal. Until you get back
a successful sync nothing is guarantied. It is not only a client
crash but also a network breach, and so on. So you never know what can happen.

So are you saying all these applications do not run on NFS?

Thanks
Boaz
[1] http://www.openldap.org/list~s/openldap-devel/201410/msg00004.html
[2] http://www.postgresql.org/docs/9.5/static/wal-internals.html , "To deal
with the case where pg_control is corrupt" ...
[3] https://www.sqlite.org/atomiccommit.html , "SQLite does always assume that
a sector write is linear" ...
[4] http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdf

Regards,
Thanu
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help