Thread (13 messages) 13 messages, 6 authors, 2023-11-11

Re: [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()`

From: Taylor Blau <hidden>
Date: 2023-11-09 19:26:14

On Wed, Nov 08, 2023 at 08:05:46AM +0100, Patrick Steinhardt wrote:
quoted
@@ -277,6 +278,18 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	return ret;
 }

+int tmp_objdir_repack(struct tmp_objdir *t)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+
+	cmd.git_cmd = 1;
+
+	strvec_pushl(&cmd.args, "repack", "-a", "-d", "-k", "-l", NULL);
+	strvec_pushv(&cmd.env, tmp_objdir_env(t));
I wonder what performance of this repack would be like in a large
repository with many refs. Ideally, I would expect that the repacking
performance should scale with the number of objects we have written into
the temporary object directory. But in practice, the repack will need to
compute reachability and thus also scales with the size of the repo
itself, doesn't it?
Good question. We definitely do not want to be doing an all-into-one
repack as a consequence of running 'git replay' in a large repository
with lots of refs, objects, or both.

But since we push the result of calling `tmp_objdir_env(t)` into the
environment of the child process, we are only repacking the objects in
the temporary directory, not the main object store.

I have a test that verifies this is the case by making sure that in a
repository with some arbitrary set of pre-existing packs, that only one
pack is added to that set after running 'replay', and that the
pre-existing packs remain in place.

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help