Thread (4 messages) 4 messages, 2 authors, 2021-01-26

Re: Musings over REQ_PREFLUSH and REQ_FUA in journal IO

From: Dave Chinner <david@fromorbit.com>
Date: 2021-01-26 05:21:17

On Mon, Jan 25, 2021 at 05:14:22PM +1100, Dave Chinner wrote:
Hi folks,

I've been thinking a little about the way we write use cache flushes
recently and I was thinking about how we do journal writes and
whether we need to issue as many cache flushes as we currently do.
....
And then I woundered if we could apply the same logic to
post-journal write cache flushes (REQ_FUA) that guarantee that the
journal writes are stable before we allow writeback of the metadata
in that LSN range (i.e. once they are unpinned). Again, we have a
completion to submission ordering requirement here, only this time
it is journal IO completion to metadata IO submission.

IOWs, I think the same observation about the log head and the AIL
writeback mechanism can be made here: we only need to ensure a cache
flush occurs before we start writing back metadata at an LSN higher
than the journal head at the time of the last cache flush. The first
iclog write of last CIL checkpoint will have ensured all
metadata lower than the LSN of the CIL checkpoint is stable, hence
we only need to concern ourselves about metadata at the same LSN as
that checkpoint. checkpoint completion will unpin that metadata, but
we still need a cache flush to guarantee ordering at the stable
storage level.

Hence we can use an on-demand AIL traversal cache flush to ensure
we have journal-to-metadata ordering. This will be much rarer than
every using FUA for every iclog write, and should be of similar
order of gains to the REQ_PREFLUSH optimisation.

FWIW, because we use checksums to detect complete checkpoints in
the journal now, we don't actually need to use FUA writes to
guarantee they hit stable storage. We don't have a guarantee in what
order they will hit the disk (even with FUA), so the only thing that
the FUA write gains us is that on some hardware it elides the need
for a post-write cache flush. Hence I don't think we need REQ_FUA,
either.
I think that this can be greatly simplified. We simply us
REQ_PREFLUSH | REQ_FUA on all commit records that close off a
transaction. The pre-flush can be used to guarantee that all the
preceeding log writes have completed to the journal, then the commit
record is written w/ FUA, guaranteeing the entire checkpoint is on
stable storage before we run the checkpoint completion callbacks
that unpin the dirty items and insert them into the AIL. This means
we don't need to modify the AIL at all, and all the metadata vs
journal ordering is still maintained entirely within the journal.

The only additional complexity is that we have to separate the
commit record into a new iclog from the rest of the checkpoint,
unless the checkpoint fits entirely inside a single iclog. I don't
think this is hard to do - we can probably do it once we've written
the commit record and hold a reference to the iclog the commit
record was written to that prevents it from being flushed until
we release the reference to it.
The only explicit ordering we really have are log forces. As long as
log forces issue a cache flush when they are left pending by CIL
transaction completion, we shouldn't require anything more here. The
situation is similar to the AIL requirement...
This won't a concern with the above change, because the commit
mechanism provides the same guarantees about stable journal contents
as it does now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help