Thread (79 messages) 79 messages, 4 authors, 2024-12-20

Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname

From: karthik nayak <hidden>
Date: 2024-12-12 14:47:55

Patrick Steinhardt [off-list ref] writes:
On Mon, Dec 09, 2024 at 12:07:20PM +0100, Karthik Nayak wrote:
quoted
The reference transaction only allows a update for a given reference to
avoid conflicts. This, however, isn't an issue for reflogs. There are no
conflicts to be resolved in reflogs and when migrating reflogs between
backends we'd have multiple reflog entries for the same refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

This is required to add reflog migration support to `git refs migrate`
which currently doesn't support it.
Nit: the second half of this sentence starting with "which currently..."
feels rather pointless, as it's implicit in the first half. I'd just
drop it.
Will do.
quoted
Signed-off-by: Karthik Nayak <redacted>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 16 +++++++++++++---
 2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,

 	update->backend_data = lock;

+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
Hm. Does this mean that we don't lock at all for REF_LOG_ONLY updates?
Reflogs themselves have no lockfile, so isn't it mandatory that we lock
the corresponding ref like we used to do? Otherwise I cannot see how we
avoid two concurrent writers to the same reflog.
No it doesn't, this is after the lock is obtained. We simply exit early
since for reflog only updates, there is nothing further to do.
quoted
@@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,

 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
This on the other hand is sensible -- having multiple REF_LOG_ONLY
transactions queued is fine.
quoted
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}

 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				update_index = strintmap_get(&logs_ts, u->refname);
+				log->update_index = update_index;
+				strintmap_set(&logs_ts, u->refname, update_index+1);
So we're now tracking update indices via another map in order to ensure
that the update index will be increased if we have multiple reflog
entries for the same refname. Can we avoid that overhead by instead just
having a global update index counter that increases for every single
reflog entry, regardless of whether we have multiple ones queued up for
the same reference?

I guess the result would be kind of weird as a single transaction with
multiple ref updates would now always contain N different update
indices. Maybe there's an alternative that allows us to reduce the cost,
like only doing this for REF_LOG_ONLY updates?

I'm mostly being careful because this here is the hot loop of writing
refs, so I don't want to regress performance.
Thanks for bringing this up. I was thinking hard about this for a while.
I also did some local benchmarking, for 10000 atomic writes, I couldn't
find a note-worthy regression.

But I really like the point you made about how this could probably use a
counter. I think we can use `u->index`. The index field was added to
ensure that the logs stay in order when we sort inside
`write_transaction_table()`. But we can use the same here. We can simply
do

    log->update_index = ts + u->index;

This would increment the update_index as needed and also only kick in
when the `index` itself is set. Which would be only for reflog migration
between backends.

I think this would work well.

Karthik

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help