Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction

[PATCH 0/8] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 1/8] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 2/8] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 3/8] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 4/8] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 4/8] reftable/stack: simplify tracking of table locks · Justin Tobler <hidden> · 2024-07-31
[PATCH 5/8] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Justin Tobler <hidden> · 2024-07-31
Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-01
[PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Justin Tobler <hidden> · 2024-08-01
Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-01
[PATCH 8/8] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH v2 0/9] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 1/9] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 3/9] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables · Karthik Nayak <hidden> · 2024-08-08
[PATCH v2 4/9] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 5/9] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Karthik Nayak <hidden> · 2024-08-08
Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Justin Tobler <hidden> · 2024-08-06
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-07
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Justin Tobler <hidden> · 2024-08-07
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Karthik Nayak <hidden> · 2024-08-08
Re: [PATCH v2 0/9] reftable: improvements and fixes for compaction · Karthik Nayak <hidden> · 2024-08-08
[PATCH v3 0/9] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 1/9] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 3/9] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 4/9] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 5/9] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-08
Re: [PATCH v3 0/9] reftable: improvements and fixes for compaction · Karthik Nayak <hidden> · 2024-08-08

From: Patrick Steinhardt <hidden>
Date: 2024-08-01 08:41:08

On Wed, Jul 31, 2024 at 08:04:17PM -0500, Justin Tobler wrote:

On 24/07/31 04:15PM, Patrick Steinhardt wrote:

quoted

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

I'm not quite sure I understand at this point how it is possible for two
compaction operations to be performed concurrently. Wouldn't there
always be overlap between the two compaction segments thus causing one
of the operations to be unable to acquire all of the required locks and
abort?

In practice we cannot assume anything about how another process compacts
tables. While we can assume something about how a particular version of
Git compacts tables, we cannot assume anything about future versions of
Git or about alternate implementations of Git. The reftable backend
allows for compacting only a subset of tables, and the heuristic is not
mandated by the on-disk format except that the tables that we are about
to compact need to be next to each other in the stack.

Furthermore, with the next patch, we also handle it gracefully when some
parts of the stack are locked already. Thus, it can easily happen that
process A compacts tables 1 to 3, whereas process B will try to compact
tables 1 to 5, fail to acquire the lock for table 3, and then reduce the
range to compact to 3 to 5.

quoted

changed after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

If a concurrent compaction happens though, it would mess up the indices
and cause problems when writting the "tables.list" file. That would not
be good.

Yup.

quoted

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

This is indeed a problem. Since we don't reload the stack, we are
unaware of any concurrently append tables causing them to not be
written in the new "tables.list" file. Scary

Indeed.

quoted

+		/*
+		 * We have found the new range that we want to replace, so
+		 * let's update the range of tables that we want to replace.
+		 */
+		last_to_replace = last + (new_offset - first);
+		first_to_replace = new_offset;
+	} else {
+		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);

I was confused at first by the `stack_len` + 1. The extra element is
NULL which tells us there are no more tables to add to the list,
correct? It looks like `fd_read_lines()` also adds an extra element.

Yes, that's the reason why we have it. We end up passing `names` to
`free_names()`, which uses `NULL` as a sentinel value to know when to
stop iterating over the array's entries.

I'll add a comment.

Thanks for your review. I'll wait a bit longer before sending out
another version of this patch series to wait for some more feedback.

Patrick

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help