Re: [PATCH 00/15] Refactor chunk-format into an API

From: Derrick Stolee <hidden>
Date: 2020-12-04 13:58:22

On 12/4/2020 7:48 AM, René Scharfe wrote:

Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget:

...

quoted

 Documentation/technical/chunk-format.txt      |  54 ++
 .../technical/commit-graph-format.txt         |   3 +
 Documentation/technical/pack-format.txt       |   3 +
 Makefile                                      |   1 +
 chunk-format.c                                | 105 ++++
 chunk-format.h                                |  69 +++
 commit-graph.c                                | 298 ++++++-----
 midx.c                                        | 466 ++++++++----------
 t/t5318-commit-graph.sh                       |   2 +-
 t/t5319-multi-pack-index.sh                   |   6 +-
 10 files changed, 623 insertions(+), 384 deletions(-)

623-384-54-3-3-1-69-2-6 = 101

So if we ignore changes to documentation, headers, tests and build
script this spends ca. 100 more lines of code than the current version.
That's roughly the size of the new file chunk-format.c -- from this
bird's-eye-view the new API seems to be pure overhead.

Overhead in terms of lines of code, but many of those are function
prototypes and single lines containing only "{" and "}". So yes,
the code files are a bit longer, but the amount of executed code is
not meaningfully different.

Extra lines of code is an expected cost of refactoring. The remaining
question is, "is it worth the cost?" I believe it is.

In the new code I see several magic numbers, use of void pointers and
casting as well as repetition -- is this really going in the right
direction?  I get the feeling that YAGNI.

void pointers are a cost of abstraction in C that we use all over the
codebase.

You (and Junio) are right to point out my magic numbers. Those should
be replaced with something better when possible.

As far as YAGNI, I doubt that very much. First, we have already seen
extensions to the commit-graph that added several new chunks, and
plugging into this (documented) API should be easier than the previous
ad-hoc mechanism.

I've CC'd Abhishek to get his opinion, since he's recently added chunks
to the commit-graph file. Outside of the fact that this series conflicts
with his series (which I will fix), it would be good to see if he
appreciates this model.

quoted

I was thinking about file formats recently and realized that the "chunks"
that are common to the commit-graph and multi-pack-index could inform future
file formats. To make that process easier, let's combine the process of
writing and reading chunks into a common API that both of these existing
formats use.

And another point on YAGNI: I'm literally prototyping a new file format and
want to use this API to build it instead of repeating myself. Specifically,
I noticed that the commit-graph and multi-pack-index were inconsistent in
how they protected the file format in different ways during writes and reads.
This leads to...

quoted

There is some extra benefit immediately: the writing and reading code for
each gets a bit cleaner. Also, there were different checks in each that made
the process more robust. Now, these share a common set of checks.

...my point that combining these checks make both codepaths slightly more
robust. I didn't even include the potential extension of storing the size
of each chunk in "struct commit_graph" and "struct multi_pack_index" for
run-time bound checks during lookups. That seemed like too much new
behavior for a series that intends to only refactor.

Thanks,
-Stolee

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help