Re: [PATCH v7 00/13] nd/pack-objects-pack-struct updates
From: Duy Nguyen <hidden>
Date: 2018-03-26 17:05:30
On Mon, Mar 26, 2018 at 5:13 PM, Jeff King [off-list ref] wrote:
On Sat, Mar 24, 2018 at 07:33:40AM +0100, Nguyễn Thái Ngọc Duy wrote:quoted
+unsigned long oe_get_size_slow(struct packing_data *pack, + const struct object_entry *e) +{ + struct packed_git *p; + struct pack_window *w_curs; + unsigned char *buf; + enum object_type type; + unsigned long used, avail, size; + + if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) { + read_lock(); + if (sha1_object_info(e->idx.oid.hash, &size) < 0) + die(_("unable to get size of %s"), + oid_to_hex(&e->idx.oid)); + read_unlock(); + return size; + } + + p = oe_in_pack(pack, e); + if (!p) + die("BUG: when e->type is a delta, it must belong to a pack"); + + read_lock(); + w_curs = NULL; + buf = use_pack(p, &w_curs, e->in_pack_offset, &avail); + used = unpack_object_header_buffer(buf, avail, &type, &size); + if (used == 0) + die(_("unable to parse object header of %s"), + oid_to_hex(&e->idx.oid)); + + unuse_pack(&w_curs); + read_unlock(); + return size; +}It took me a while to figure out why this treated deltas and non-deltas differently. At first I thought it was an optimization (since we can find non-delta sizes quickly by looking at the headers). But I think it's just that you want to know the size of the actual _delta_, not the reconstructed object. And there's no way to ask sha1_object_info() for that. Perhaps the _extended version of that function should learn an OBJECT_INFO_NO_DEREF flag or something to tell it return the true delta type and size. Then this whole function could just become a single call. But short of that, it's probably worth a comment explaining what's going on.
I thought the elaboration on "size" in the big comment block in front of struct object_entry was enough. I was wrong. Will add something here.
quoted
+Running tests with special setups +--------------------------------- + +The whole test suite could be run to test some special features +that cannot be easily covered by a few specific test cases. These +could be enabled by running the test suite with correct GIT_TEST_ +environment set. + +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite. + +GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code +path where there are more than 1024 packs even if the actual number of +packs in repository is below this limit. + +GIT_TEST_OE_SIZE_BITS=<bits> exercises the uncommon pack-objects +code path where we do not cache objecct size in memory and read it +from existing packs on demand. This normally only happens when the +object size is over 2GB. This variable forces the code path on any +object larger than 2^<bits> bytes.It's nice to have these available to test the uncommon cases. But I have a feeling nobody will ever run them, since it requires extra effort (and takes a full test run).
I know :) I also know that this does not interfere with GIT_TEST_SPLIT_INDEX, which is being run in Travis. So the plan (after this series is merged) is to make Travis second run to do something like make test GIT_TEST_SPLIT...=1 GIT_TEST_FULL..=1 GIT_TEST_OE..=4 we don't waste more cpu cycles and we can make sure these code paths are always run (at least on one platform)
I see there's a one-off test for GIT_TEST_FULL_IN_PACK_ARRAY, which I think is a good idea, since it makes sure the code is exercised in a normal test suite run. Should we do the same for GIT_TEST_OE_SIZE_BITS?
I think the problem with OE_SIZE_BITS is it has many different code paths (like reused deltas) which is hard to make sure it runs. But yes I think I could construct a pack that executes both code paths in oe_get_size_slow(). Will do in a reroll.
I haven't done an in-depth read of each patch yet; this was just what jumped out at me from reading the interdiff.
I would really appreciate it if you could find some time to do it. The bugs I found in this round proved that I had no idea what's really going on in pack-objects. Sure I know the big picture but that's far from enough to do changes like this. -- Duy