Re: [PATCH] pack-format.txt: more details on pack file format
From: Stefan Beller <hidden>
Date: 2018-05-08 18:58:24
quoted
quoted
+Deltified representationDoes this refer to OFS delta as well as REF deltas?Yes. Both OFS and REF deltas have the same "body" which is what this part is about. The differences between OFS and REF deltas are not described (in fact I don't think we describe what OFS and REF deltas are at all).
Maybe we should?
quoted
quoted
is a sequence of one byte command optionally +followed by more data for the command. The following commands are +recognized:So a Deltified representation of an object is a 6 or 7 in the 3 bit type and then the length. Then a command is shown how to construct the object based on other objects. Can there be more commands?quoted
+- If bit 7 is set, the remaining bits in the command byte specifies + how to extract copy offset and size to copy. The following must be + evaluated in this exact order:So there are 2 modes, and the high bit indicates which mode is used. You start describing the more complicated mode first, maybe give names to both of them? "direct copy" (below) and "compressed copy with offset" ?I started to update this more because even this text is hard to get even to me. So let's get the background first. We have a source object somewhere (the object name comes from ofs/ref delta's header), basically we have the whole content. This delta thingy tells us how to use that source object to create a new (target) object. The delta is actually a sequence of instructions (of variable length).
The previous paragraph and this sentence are great for my understanding. thanks! (Maybe keep it in a similar form around?)
One is for copying from the source object.
ok that makes sense. I can think of it as a "HTTP range request", just optimized for packfiles and the source is inside the same pack. So it would say "Goto object <sha1> and copy bytes 13-168 here"
The other copies from the delta itself
itself means the same object here, that we are describing here? or does it mean other deltas?
(e.g. this is new data in the target which is not available anywhere in the source object to copy from).
The instruction looks like this
bit 0 1 2 3 4 5 6
+----------+--------+--------+--------+--------+------+------+------+
| 1xxxxxxx | offset | offset | offset | offset | size | size | size |
+----------+--------+--------+--------+--------+------+------+------+
Here you can see it in its full form, each box represents a byte. The
first byte has bit 7 set as mentioned. We can see here that offsets
(where to copy from in the source object) takes 4 bytes and size (how
many bytes to copy) takes 3. Offset size size is in LSB order.
The "xxxxxxx" part lets us shrink this down... by indicating how much prefix we can skip and assume it be all zero(?)
If the offset can fit in 16 bits, there's no reason to waste the last two bytes describing zero. Each 'x' marks whether the corresponding byte is present.
So for a full instruction (as above), we'd have to 1 1111 111 <4 bytes offset> <3 bytes size> for smaller instructions we have 1 1100 100 <2 bytes offset> <1 byte size> and here the offset is in range 0..64k and the size is 1-255 or 0x10000 ? Modes to skip bytes in between are not allowed, e.g. 1 1101 101 < 3 bytes of offsets> <2 bytes of size> and the missing bytes would be assumed to be 0?
The bit number is in the first row. So if you have offset 255 and size 1, the instruction is three bytes 10010001b, 255,
Oh it is the other way round, the size will be just one byte, indicating we can have a range of 1-255 or 0x10000 and an offset of 0..255.
I think this is a corner case in this format. I think Nico meant to specify consecutive bytes: if size is 2 bytes then you have to specify _both_ of them even if the first byte could be zero and omitted.
So it is not a mutually exclusive group, but a sequence (similar as in git-bisect), where we start with 0 and end with exactly one edge in between (sort of, we can also start with 1, then we have to have all 1s)
The implementation detail is, if bit 6 is set but bit 4 is not, then the size value is pretty much random. It's only when bit 4 is set that we first clear out "size" and start adding bits to it.
That sounds similar to what I spelled out above. Thanks for taking on the documentation here. The box with numbers really helped me! Stefan