Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling
From: Jan Kara <jack@suse.cz>
Date: 2015-11-18 10:40:55
Also in:
linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm
On Mon 16-11-15 13:09:50, Ross Zwisler wrote:
On Fri, Nov 13, 2015 at 06:32:40PM -0800, Dan Williams wrote:quoted
On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger [off-list ref] wrote:quoted
On Nov 13, 2015, at 5:20 PM, Dan Williams [off-list ref] wrote:quoted
On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler [off-list ref] wrote:quoted
Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios. These are sent down via blkdev_issue_flush() in response to a fsync() or msync() and are used by filesystems to order their metadata, among other things. When we get an msync() or fsync() it is the responsibility of the DAX code to flush all dirty pages to media. The PMEM driver then just has issue a wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all the flushed data has been durably stored on the media. Signed-off-by: Ross Zwisler <redacted>Hmm, I'm not seeing why we need this patch. If the actual flushing of the cache is done by the core why does the driver need support REQ_FLUSH? Especially since it's just a couple instructions. REQ_FUA only makes sense if individual writes can bypass the "drive" cache, but no I/O submitted to the driver proper is ever cached we always flush it through to media.If the upper level filesystem gets an error when submitting a flush request, then it assumes the underlying hardware is broken and cannot be as aggressive in IO submission, but instead has to wait for in-flight IO to complete.Upper level filesystems won't get errors when the driver does not support flush. Those requests are ended cleanly in generic_make_request_checks(). Yes, the fs still needs to wait for outstanding I/O to complete but in the case of pmem all I/O is synchronous. There's never anything to await when flushing at the pmem driver level.quoted
Since FUA/FLUSH is basically a no-op for pmem devices, it doesn't make sense _not_ to support this functionality.Seems to be a nop either way. Given that DAX may lead to dirty data pending to the device in the cpu cache that a REQ_FLUSH request will not touch, its better to leave it all to the mm core to handle. I.e. it doesn't make sense to call the driver just for two instructions (sfence + pcommit) when the mm core is taking on the cache flushing. Either handle it all in the mm or the driver, not a mixture.Does anyone know if ext4 and/or XFS alter their algorithms based on whether the driver supports REQ_FLUSH/REQ_FUA? Will the filesystem behave more efficiently with respect to their internal I/O ordering, etc., if PMEM advertises REQ_FLUSH/REQ_FUA support, even though we could do the same thing at the DAX layer?
So the information whether the driver supports FLUSH / FUA is generally ignored by filesystems. We issue REQ_FLUSH / REQ_FUA requests to achieve required ordering for fs consistency and expect that block layer does the right thing - i.e., if the device has volatile write cache, it will be flushed, if it doesn't have it, the request will be ignored. So the difference between supporting and not supporting REQ_FLUSH / REQ_FUA is only in how block layer handles such requests. Honza -- Jan Kara [off-list ref] SUSE Labs, CR