Thread (39 messages) 39 messages, 5 authors, 2023-11-14

Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`

From: Jeff King <hidden>
Date: 2023-11-14 22:04:37

On Mon, Nov 13, 2023 at 05:34:42PM -0500, Taylor Blau wrote:
quoted
It might not be too hard to just let in-process callers access the
objects we've written. A quick and dirty patch is below, which works
with the test you suggested (the test still fails because it finds a
conflict, but it gets past the "woah, I can't find that sha1" part).
That's a very slick idea, and I think that this series has some legs to
stand on now as a result.
I'm glad people seem to like it. ;)

I was mostly throwing it out there as an illustration, and I don't plan
on polishing it up further. But in case somebody else wants to work on
it, here are random extra thoughts on the topic:

  - rather than teach packfile.c about bulk checkin, I think it might be
    cleaner to let packed_git structs have a function pointer for
    accessing the index (and if NULL, naturally we'd open the .idx file
    in the usual way). And then bulk checkin could just register the
    "fake" packed_git

  - the flip side, though, is: would it be weird for other parts of the
    code to ever see the fake bulk packed_git in the list? It doesn't
    represent a real packfile the way the other ones do. So maybe it is
    better to keep it separate

  - I suspect this scheme may violate some bounds-checking of the
    packfiles. For example, we haven't written the hashfile trailer yet in
    the still-open packfile. But I think use_pack() assumes we always
    have an extra the_hash_algo->rawsz bytes of padding at the end. I'm
    not sure if that would ever cause us to accidentally read past the
    end of the file.

  - as you mentioned, an oidmap would be an obvious replacement for the
    existing unsorted list

  - the existing already_written() function faces the same problem. I
    don't think anybody cared because we'd usually only bulk checkin a
    few huge objects. But with --write-pack, we might write a lot of
    such objects, and the linear scan in already_written() makes it
    accidentally quadratic.

  - speaking of already_written(): it calls repo_has_object_file() to
    see if we can elide the write. It probably should be using
    freshen_*_object() in the usual way. Again, this is an existing bug
    but I suspect nobody noticed because the bulk checkin code path
    hardly ever kicks in.

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help