Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction

[PATCH 0/8] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 1/8] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 2/8] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 3/8] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 4/8] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 4/8] reftable/stack: simplify tracking of table locks · Justin Tobler <hidden> · 2024-07-31
[PATCH 5/8] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Justin Tobler <hidden> · 2024-07-31
Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-01
[PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-07-31
Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Justin Tobler <hidden> · 2024-08-01
Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-01
[PATCH 8/8] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-07-31
[PATCH v2 0/9] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 1/9] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 3/9] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables · Karthik Nayak <hidden> · 2024-08-08
[PATCH v2 4/9] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 5/9] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-05
[PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Karthik Nayak <hidden> · 2024-08-08
Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-05
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Justin Tobler <hidden> · 2024-08-06
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-07
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Justin Tobler <hidden> · 2024-08-07
Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction · Karthik Nayak <hidden> · 2024-08-08
Re: [PATCH v2 0/9] reftable: improvements and fixes for compaction · Karthik Nayak <hidden> · 2024-08-08
[PATCH v3 0/9] reftable: improvements and fixes for compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 1/9] reftable/stack: refactor function to gather table sizes · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 3/9] reftable/stack: test compaction with already-locked tables · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 4/9] reftable/stack: update stats on failed full compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 5/9] reftable/stack: simplify tracking of table locks · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list" · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction · Patrick Steinhardt <hidden> · 2024-08-08
[PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction · Patrick Steinhardt <hidden> · 2024-08-08
Re: [PATCH v3 0/9] reftable: improvements and fixes for compaction · Karthik Nayak <hidden> · 2024-08-08

From: Karthik Nayak <hidden>
Date: 2024-08-08 12:25:29

Patrick Steinhardt [off-list ref] writes:

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeded to lock two or

s/succeded/succeeded

more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Right. This is really well written.

[snip]

quoted hunk ↗ jump to hunk

@@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
 	/*
 	 * Lock all tables in the user-provided range. This is the slice of our
 	 * stack which we'll compact.
+	 *
+	 * Note that we lock tables in reverse order from last to first. The
+	 * intent behind this is to allow a newer process to perform best
+	 * effort compaction of tables that it has added in the case where an
+	 * older process is still busy compacting tables which are preexisting
+	 * from the point of view of the newer process.
 	 */
 	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
-	for (i = first; i <= last; i++) {
-		stack_filename(&table_name, st, reader_name(st->readers[i]));
+	for (i = last + 1; i > first; i--) {
+		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));

 		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
-			if (errno == EEXIST)
+			/*
+			 * When the table is locked already we may do a
+			 * best-effort compaction and compact only the tables
+			 * that we have managed to lock so far. This of course
+			 * requires that we have been able to lock at least two
+			 * tables, otherwise there would be nothing to compact.
+			 * In that case, we return a lock error to our caller.
+			 */
+			if (errno == EEXIST && last - (i - 1) >= 2 &&
+			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
+				err = 0;
+				/*
+				 * The subtraction is to offset the index, the
+				 * addition is to only compact up to the table
+				 * of the preceding iteration. They obviously
+				 * cancel each other out, but that may be
+				 * non-obvious when it was omitted.
+				 */
+				first = (i - 1) + 1;

I remember thinking about this and how to fix it, I'm happy how you've
done it. It's simple and sufficient. Kudos.

[snip]

Overall this patch looked great. I have nothing to add, I really like it
that we use a flag and this means this would only come into play for
auto-compaction.

Attachments

signature.asc [application/pgp-signature] 690 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help