Thread (86 messages) 86 messages, 3 authors, 2021-06-03

Re: [PATCH 33/39] xfs: Add order IDs to log items in CIL

From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-06-03 00:49:16

On Thu, Jun 03, 2021 at 10:16:22AM +1000, Dave Chinner wrote:
On Thu, May 27, 2021 at 12:00:23PM -0700, Darrick J. Wong wrote:
quoted
On Wed, May 19, 2021 at 10:13:11PM +1000, Dave Chinner wrote:
quoted
From: Dave Chinner <redacted>

Before we split the ordered CIL up into per cpu lists, we need a
mechanism to track the order of the items in the CIL. We need to do
this because there are rules around the order in which related items
must physically appear in the log even inside a single checkpoint
transaction.

An example of this is intents - an intent must appear in the log
before it's intent done record so taht log recovery can cancel the
s/taht/that/
quoted
intent correctly. If we have these two records misordered in the
CIL, then they will not be recovered correctly by journal replay.

We also will not be able to move items to the tail of
the CIL list when they are relogged, hence the log items will need
some mechanism to allow the correct log item order to be recreated
before we write log items to the hournal.

Hence we need to have a mechanism for recording global order of
transactions in the log items  so that we can recover that order
from un-ordered per-cpu lists.

Do this with a simple monotonic increasing commit counter in the CIL
context. Each log item in the transaction gets stamped with the
current commit order ID before it is added to the CIL. If the item
is already in the CIL, leave it where it is instead of moving it to
the tail of the list and instead sort the list before we start the
push work.

XXX: list_sort() under the cil_ctx_lock held exclusive starts
hurting that >16 threads. Front end commits are waiting on the push
to switch contexts much longer. The item order id should likely be
moved into the logvecs when they are detacted from the items, then
the sort can be done on the logvec after the cil_ctx_lock has been
released. logvecs will need to use a list_head for this rather than
a single linked list like they do now....
...which I guess happens in patch 35 now?
Right. I'll just remove this from the commit message.
quoted
quoted
@@ -780,6 +780,26 @@ xlog_cil_build_trans_hdr(
 	tic->t_curr_res -= lvhdr->lv_bytes;
 }
 
+/*
+ * CIL item reordering compare function. We want to order in ascending ID order,
+ * but we want to leave items with the same ID in the order they were added to
When do we have items with the same id?
All the items in a single transaction have the same id. The order id
increments before we tag all the items in the transaction and insert
them into the CIL.
quoted
I guess that happens if we have multiple transactions adding items to
the cil at the same time?  I guess that's not a big deal since each of
those threads will hold a disjoint set of locks, so even if the order
ids are the same for a bunch of items, they're never going to be
touching the same AG/inode/metadata object, right?

If that's correct, then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

While true, it's not the way this works so I won't immediately
accept your RVB. The reason for not changing the ordering within a
single transaction is actually intent logging.  i.e. this:
quoted
quoted
+ * the list. This is important for operations like reflink where we log 4 order
+ * dependent intents in a single transaction when we overwrite an existing
+ * shared extent with a new shared extent. i.e. BUI(unmap), CUI(drop),
+ * CUI (inc), BUI(remap)...
There's a specific order of operations that recovery must run these
intents in, and so if we re-order them here in the CIL they'll be
out of order in the log and recovery will replay the intents in the
wrong order. Replaying the intents in the wrong order results in
corruption warnings and assert failures during log recovery, hence
the constraint of not re-ordering items within the same transaction.
<ding> lightbulb comes on.  I think I understood this better the last
time I read all these patches. :/

Basically, for each item that can be attached to a transaction, you're
assigning it an "order id" that is a monotonically increasing counter
that (roughly) records the last time the item was committed.  Certain
items (like inodes) can be relogged and committed multiple times in
rapid fire succession, in which case the order_id will get bumped
forward.

In the /next/ patch you'll change the cil item list to be per-cpu and
only splice the mess together at cil push time.  For that to work
properly, you have to re-sort that resulting list in commit order (aka
the order_id) to keep the items in order of commit.

For items *within* a transaction, you take advantage of the property
of list_sort that it won't reorder items with cmp(a, b) == 0, which
means that all the intents logged to a transaction will maintain the
same order that the author of higher level code wrote into the software.

Question: xlog_cil_push_work zeroes the order_id of pushed log items.
Is there any potential problem here when ctx->order_id wraps around to
zero?  I think the answer is that we'll move on to a new cil context
long before we hit 2^32-1 transactions?

--D
Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help