Thread (224 messages) 224 messages, 7 authors, 2018-04-06

Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-03-05 14:00:25

On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
largest pack to avoid this problem is also known for a long time.

Let's do the suggestion automatically instead of waiting for people to
come to Git mailing list and get the advice. When a certain condition
is met, gc --auto create a .keep file temporary before repack is run,
then remove it afterward.

gc --auto does this based on an estimation of pack-objects memory
usage and whether that fits in one third of system memory (the
assumption here is for desktop environment where there are many other
applications running).

Since the estimation may be inaccurate and that 1/3 threshold is
arbitrary, give the user a finer control over this mechanism as well:
if the largest pack is larger than gc.bigPackThreshold, it's kept.
This is very promising. Saves lots of memory on my ad-hoc testing of
adding a *.keep file on an in-house repo.
+	if (big_pack_threshold)
+		return pack->pack_size >= big_pack_threshold;
+
+	/* First we have to scan through at least one pack */
+	mem_want = pack->pack_size + pack->index_size;
+	/* then pack-objects needs lots more for book keeping */
+	mem_want += sizeof(struct object_entry) * nr_objects;
+	/*
+	 * internal rev-list --all --objects takes up some memory too,
+	 * let's say half of it is for blobs
+	 */
+	mem_want += sizeof(struct blob) * nr_objects / 2;
+	/*
+	 * and the other half is for trees (commits and tags are
+	 * usually insignificant)
+	 */
+	mem_want += sizeof(struct tree) * nr_objects / 2;
+	/* and then obj_hash[], underestimated in fact */
+	mem_want += sizeof(struct object *) * nr_objects;
+	/*
+	 * read_sha1_file() (either at delta calculation phase, or
+	 * writing phase) also fills up the delta base cache
+	 */
+	mem_want += delta_base_cache_limit;
+	/* and of course pack-objects has its own delta cache */
+	mem_want += max_delta_cache_size;
I'm not familiar enough with this part to say, but isn't this assuming a
lot about the distribution of objects in a way that will cause is not to
repack in some pathological cases?

Probably worth documenting...
+	/* Only allow 1/3 of memory for pack-objects */
+	mem_have = total_ram() / 3;
Would be great to have this be a configurable variable, so you could set
it to e.g. 33% (like here), 50% etc.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help