Thread (43 messages) 43 messages, 3 authors, 2024-08-08

Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction

From: Patrick Steinhardt <hidden>
Date: 2024-08-01 08:41:08

On Wed, Jul 31, 2024 at 08:04:17PM -0500, Justin Tobler wrote:
On 24/07/31 04:15PM, Patrick Steinhardt wrote:
quoted
But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.
I'm not quite sure I understand at this point how it is possible for two
compaction operations to be performed concurrently. Wouldn't there
always be overlap between the two compaction segments thus causing one
of the operations to be unable to acquire all of the required locks and
abort?
In practice we cannot assume anything about how another process compacts
tables. While we can assume something about how a particular version of
Git compacts tables, we cannot assume anything about future versions of
Git or about alternate implementations of Git. The reftable backend
allows for compacting only a subset of tables, and the heuristic is not
mandated by the on-disk format except that the tables that we are about
to compact need to be next to each other in the stack.

Furthermore, with the next patch, we also handle it gracefully when some
parts of the stack are locked already. Thus, it can easily happen that
process A compacts tables 1 to 3, whereas process B will try to compact
tables 1 to 5, fail to acquire the lock for table 3, and then reduce the
range to compact to 3 to 5.
quoted
changed after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.
If a concurrent compaction happens though, it would mess up the indices
and cause problems when writting the "tables.list" file. That would not
be good.
Yup.
quoted
This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.
This is indeed a problem. Since we don't reload the stack, we are
unaware of any concurrently append tables causing them to not be
written in the new "tables.list" file. Scary
Indeed.
quoted
+		/*
+		 * We have found the new range that we want to replace, so
+		 * let's update the range of tables that we want to replace.
+		 */
+		last_to_replace = last + (new_offset - first);
+		first_to_replace = new_offset;
+	} else {
+		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
I was confused at first by the `stack_len` + 1. The extra element is
NULL which tells us there are no more tables to add to the list,
correct? It looks like `fd_read_lines()` also adds an extra element.
Yes, that's the reason why we have it. We end up passing `names` to
`free_names()`, which uses `NULL` as a sentinel value to know when to
stop iterating over the array's entries.

I'll add a comment.

Thanks for your review. I'll wait a bit longer before sending out
another version of this patch series to wait for some more feedback.

Patrick

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help