Re: [PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS
From: Jan Kara <jack@suse.cz>
Date: 2016-02-11 12:43:04
Also in:
linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm
On Wed 10-02-16 13:48:54, Ross Zwisler wrote:
During testing of raw block devices + DAX I noticed that the struct block_device that we were using for DAX operations was incorrect. For the fault handlers, etc. we can just get the correct bdev via get_block(), which is passed in as a function pointer, but for the *sync code and for sector zeroing we don't have access to get_block(). This is also an issue for XFS real-time devices, whenever we get those working. Patch one of this series fixes the DAX sector zeroing code by explicitly passing in a valid struct block_device. Patch two of this series fixes DAX *sync support by moving calls to dax_writeback_mapping_range() out of filemap_write_and_wait_range() and into the filesystem/block device ->writepages function so that it can supply us with a valid block device. This also fixes DAX code to properly flush caches in response to sync(2). Thanks to Jan Kara for his initial draft of patch 2: https://lkml.org/lkml/2016/2/9/485 Here are the changes that I've made to that patch: 1) For DAX mappings, only return after calling dax_writeback_mapping_range() if we encountered an error. In the non-error case we still need to write back normal pages, else we lose metadata updates. 2) In dax_writeback_mapping_range(), move the new check for if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL) above the i_blkbits check. In my testing I found cases where dax_writeback_mapping_range() was called for inodes with i_blkbits != PAGE_SHIFT - I'm assuming these are internal metadata inodes? They have no exceptional DAX entries to flush, so we have no work to do, but if we return error from the i_blkbits check we will fail the overall writeback operation. Please let me know if it seems wrong for us to be seeing inodes set to use DAX but with i_blkbits != PAGE_SHIFT and I'll get more info.
So I'm wondering - how come S_DAX flag got set for inode where i_blkbis != PAGE_SHIFT? That would seem to be a bug? I specifically ordered the checks like this to catch such issues.
3) In filemap_write_and_wait() and filemap_write_and_wait_range(), continue the writeback in the case that DAX is enabled but we only have a nonzero mapping->nrpages. As with 1) and 2), I believe this is necessary to properly writeback metadata changes. If this sounds wrong, please let me know and I'll get more info.
And I'm surprised here as well. If there are dax_mapping() inodes that have pagecache pages, then we have issues with radix tree handling as well. So how come dax_mapping() inodes have pages attached? If it is about block device inodes, then I find it buggy, that S_DAX gets set for such inodes when filesystem is mounted on them because in such cases we are IMO asking for data corruption sooner rather than later... Honza -- Jan Kara [off-list ref] SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>