Re: [PATCH 2/5] xfs: separate CIL commit record IO
From: Christoph Hellwig <hch@infradead.org>
Date: 2021-02-01 13:00:18
On Thu, Jan 28, 2021 at 03:41:51PM +1100, Dave Chinner wrote:
From: Dave Chinner <redacted> To allow for iclog IO device cache flush behaviour to be optimised, we first need to separate out the commit record iclog IO from the rest of the checkpoint so we can wait for the checkpoint IO to complete before we issue the commit record. This separate is only necessary if the commit record is being
s/separate/separation/g
written into a different iclog to the start of the checkpoint. If the entire checkpoint and commit is in the one iclog, then they are both covered by the one set of cache flush primitives on the iclog and hence there is no need to separate them. Otherwise, we need to wait for all the previous iclogs to complete so they are ordered correctly and made stable by the REQ_PREFLUSH that the commit record iclog IO issues. This guarantees that if a reader sees the commit record in the journal, they will also see the entire checkpoint that commit record closes off. This also provides the guarantee that when the commit record IO completes, we can safely unpin all the log items in the checkpoint so they can be written back because the entire checkpoint is stable in the journal.
I'm a little worried about the direction for devices without a volatile write cache like all highend enterprise SSDs, Arrays and hard drives, where we not introduce another synchronization point without any gains from the reduction in FUA/flush traffic that is a no-op there.