Thread (2 messages) 2 messages, 2 authors, 2016-08-11

Re: fetching packs and storing them as packs

From: Nicolas Pitre <hidden>
Date: 2016-08-11 19:49:26

Possibly related (same subject, not in this thread)

On Thu, 26 Oct 2006, Junio C Hamano wrote:
I'd almost say "heavy repository-wide operations like 'repack -a
-d' and 'prune' should operate under a single repository lock",
but historically we've avoided locks and instead tried to do
things optimistically and used compare-and-swap to detect
conflicts, so maybe that avenue might be worth pursuing.

How about (I'm thinking aloud and I'm sure there will be
holes -- I won't think about prune for now)...

* "repack -a -d":

 (1) initially run show-ref (or "ls-remote .") and store the
     result in .git/$ref_pack_lock_file;

 (2) enumerate existing packs;

 (3) do the usual "rev-list --all | pack-objects" thing; this
     may end up including more objects than what are reachable
     from the result of (1) if somebody else updates refs in the
     meantime;

 (4) enumerate existing packs; if there is difference from (2)
     other than what (3) created, that means somebody else added
     a pack in the meantime; stop and do not do the "-d" part;

 (5) run "ls-remote ." again and compare it with what it got in
     (1); if different, somebody else updated a ref in the
     meantime; stop and do not do the "-d" part;

 (6) do the "-d" part as usual by removing packs we saw in (2)
     but do not remove the pack we created in (3);

 (7) remove .git/$ref_pack_lock_file.

* "fetch --thin" and "index-pack --stdin":

 (1) check the .git/$ref_pack_lock_file, and refuse to operate
    if there is such (this is not strictly needed for
    correctness but only to give an early exit);
I don't think this is a good idea.  A fetch should always work 
irrespective of any repack taking place.  The fetch really should have 
priority over a repack since it is directly related to the user 
experience.  The repack can fail or produce suboptimal results if a race 
occurs, but the fetch must not fail for such a reason.
 (2) create a new pack under a temporary name, and when
     complete, make the pack/index pair .pack and .idx;
Actually this is what already happens if you don't specify a name to 
git-index-pack --stdin.
 (3) update the refs.
So the actual race is the really small interval between the time the new 
pack+index are moved to .git/objects/pack/ and the moment the refs are 
updated.  In practice this is probably less than a second.  All that is 
needed here is to somehow go back to (2) if that interval occurs between 
(2) and (3).

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help