Thread (5 messages) 5 messages, 3 authors, 2024-12-28

Re: git gc does not clean tmp_pack* files

From: Boomman <hidden>
Date: 2024-12-21 01:18:07

For me, two "git gc" on a same repo fail to run:
fatal: gc is already running on machine 'WIN-blah' pid 40304 (use
--force if not)

If you're already colliding on this, then I don't see why you can't
use a normal looking name without a randomized string like
"tmp_garbagecollecting", so that each execution would at least
overwrite the same location. In this case --force could append _1
probably.

-Vitaly


On Fri, Dec 20, 2024 at 1:05 AM Jeff King [off-list ref] wrote:
On Thu, Dec 19, 2024 at 03:17:01AM -0800, Junio C Hamano wrote:
quoted
Boomman [off-list ref] writes:
quoted
Yes, if the behavior in case of running out of disk space is to just
leave the malformed file there, it stands to reason that cleaning up
those malformed files should be the first operation to do for gc.
It is misleading to call them malformed, isn't it?  When a Git
process creates a packfile (or loose object file for that matter),
they are written under these tmp_* names.  When the processes die
without finalizing these (either removing or renaming into their
final names), they are left behind, and it would be better if we can
remove it _before_ another process wants to consume more disk space.
We usually automatically clean up our tempfiles if we encounter an
error, but don't do so for partially written packs. I think this is
mostly historical, though occasionally it can be useful for debugging
(e.g., indexing a pack coming over the network).

It might make sense to register them as tempfiles in the usual way,
possibly with an environment variable option to ask for them to be kept
(for debugging).

That's not foolproof, since a process can die without cleaning up after
itself (e.g., on system crash). But it would mean that a repeatedly
failing "git repack -ad" does not fill up the disk. And the decision of
when to clean up tempfiles in git-gc is less important.
quoted
But the issue is how you tell which one of these "malformed" files
are still being written and will be finalized, and which ones are
leftover ones.  You want to remove the latter without molesting the
former.  And you want to do so in a portable way, possibly even
across the network file systems.
Yeah, I think there are two issues being discussed in this thread:

  - when to clean up leftover tempfiles

  - how to decide which tempfiles are leftover

The second one is what the OP mentioned for locking. But not only does
that have portability questions, I'm not sure it is sufficient. Would we
ever write tmp_pack_*, complete our process, and then expect our caller
to do something with it (meaning there's a race where no process is
holding the lock)?

I'm not sure. We definitely write "tmp" packfiles via pack-objects and
expect git-repack to move them to their final names. I think we use a
slightly different name ("tmp-<pid>-pack-*"), but arguably we should
consider cleaning up stale versions of those, too.

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help