Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
From: Boaz Harrosh <hidden>
Date: 2016-02-28 10:17:21
Also in:
linux-nfs
On 02/26/2016 12:04 PM, Thanumalayan Sankaranarayana Pillai wrote:
On Thu, Feb 25, 2016 at 10:02 PM, Dan Williams [off-list ref] wrote:quoted
[ adding Thanu ]quoted
Very few applications actually care about atomic sector writes. Databases are probably the only class of application that really do care about both single sector and multi-sector atomic write behaviour, and many of them can be configured to assume single sector writes can be torn. Torn user data writes have always been possible, and so pmem does not introduce any new semantics that applications have to handle.I know about BTT and DAX only at a conceptual level and hence do not understand this mailing thread fully. But I can provide examples of important applications expecting atomicity at a 512B or a smaller granularity. Here is a list: (1) LMDB [1] that Dan mentioned, which expects "linear writes" (i.e., don't need atomicity, but need the first byte to be written before the second byte) (2) PostgreSQL expects atomicity [2] (3) SQLite depends on linear writes [3] (we were unable to find these dependencies during our testing, however). Also, PSOW in SQLite is not relevant to this discussion as I understand it; PSOW deals with corruption of data *around* the actual written bytes. (4) We found that ZooKeeper depends on atomicity during our testing, but we did not contact the ZooKeeper developers about this. Some details in our paper [4]. It is tempting to assume that applications do not use the concept of disk sectors and deal with only file-system blocks (which are not atomic in practice), and take measures to deal with the non-atomic file-system blocks. But, in reality, applications seem to assume that 512B (more or less) sectors are atomic or linear, and build their consistency mechanisms around that.
This all discussion is a shock to me. where were these guys hiding, under a rock? In the NFS world you can get not torn sectors but torn words. You may have reorder of writes, you may have data holes the all deal. Until you get back a successful sync nothing is guarantied. It is not only a client crash but also a network breach, and so on. So you never know what can happen. So are you saying all these applications do not run on NFS? Thanks Boaz
[1] http://www.openldap.org/list~s/openldap-devel/201410/msg00004.html [2] http://www.postgresql.org/docs/9.5/static/wal-internals.html , "To deal with the case where pg_control is corrupt" ... [3] https://www.sqlite.org/atomiccommit.html , "SQLite does always assume that a sector write is linear" ... [4] http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdf Regards, Thanu _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>