Thread (3 messages) 3 messages, 3 authors, 2024-09-27

Re: [PATCH v2 04/22] reftable/basics: handle allocation failures in `reftable_calloc()`

From: Patrick Steinhardt <hidden>
Date: 2024-09-27 05:28:39

On Thu, Sep 26, 2024 at 09:13:23AM -0700, Junio C Hamano wrote:
Patrick Steinhardt [off-list ref] writes:
quoted
quoted
But it illustrates why open coding is not necessarily an excellent
idea in the longer term, doesn't it?  When unsigned_mult_overflows()
is updated to avoid such a false positive, how would we remember
that we need to update this copy we?
I agree in general, but with the reftable library I'm stuck between a
rock and a hard place. My goal is to make it fully reusable by other
projects without first having to do surgery on their side. While having
something like `st_mult()` is simple enough to port over, the biggest
problem I have is that over time we sneak in more and more code from the
Git codebase. The result is death by a thousand cuts.
quoted
Now if we had a single header that exposes a small set of functions
without _anything_ else it could work alright. But I rather doubt that
we really want to have a standalone header for `st_mult()` that we can
include. But without such a standalone header it is simply too easy to
start building on top of Git features by accident.

So I'm instead aiming to go a different way and fully cut the reftable
code loose from Git. So even if we e.g. eventually want `struct strbuf`
return errors on failure, it would only address one part of my problem.
The dependency to "strbuf" (just as an example) was added initially
fairly early.  Soon after 27f7ed2a (reftable: add LICENSE,
2021-10-07) added the reftable/ hierarchy, e303bf22 (reftable:
(de)serialization for the polymorphic record type., 2021-10-07).  I
somehow had an impression that reftable "library" started without
any Git dependency and then use of our helper functions seemed
through from the shim layer, but it was pretty much part of the
library from day one, it seems.
I think the history goes a bit different. Initially, the reftable
library was developed completely outside of the Git tree in [1]. It did
not have any external dependencies and didn't use any of the Git code.

During the upstreaming process the discussion rather quickly swayed into
the direction of making Git the canonical upstream instead of using a
separate repository that hosts the code. See e.g. [2], which was still
tracking a VERSION file to keep track of the upstream version that the
code was pulled from. In [3] the discussion started about upstream
requiring a CLA, which was an (understandable) non-starter. I cannot
find the statement, but eventually we decided that canonical upstream
should be Git, and the VERSION file was never committed.

But with that decision, now that the reftable library was part of Git,
the reviews also started to note that it really should use functions
provided by Git. The initial version of the patch series didn't yet have
any of that. From my point of view that was a big mistake, as the result
is that Git is _not_ reusable by other projects in its current state.

[1]: https://github.com/google/reftable/tree/master/c
[2]: [ref]
[3]: [ref]
quoted
A couple months ago I said that I'll try to write something like shims
that wrap our types in reftable types such that other projects can
provide implementations for such shims themselves. I tried to make that
work, but the result was an unholy mess that really didn't make any
sense whatsoever. Most of the features that I need from the Git codebase
can be provided in a couple of lines of code (`struct strbuf` is roughly
50 lines for example), plus maybe a callback function that can be
provided to wire things up on the user side (`register_tempfiles()` for
example). So once I saw that those wrappers are harder to use and result
in roughly the same lines of code I decided to scrap that approach and
instead try to convert it fully.

So yeah, overall we shouldn't open-code things like this. But I don't
really see another way to do this for the reftable library.
But isn't all of the above what Libification ought to be about?  I
was hoping that the reftable polishing would not have to be done by
you alone, and you would recruit those who are into libification of
other parts of Git codebase to help cleaning up these fringe (from
the point of view of reftable) interfaces.

What do the libification folks feel about this (folks involved in
libgit-sys CC'ed; of course all others who are interested in the
libification topic are welcome to comment)?
I think it's orthogonal to that. The libification effort doesn't really
care about exposing low-level details like `st_mult()` as a library, or
even the `strbuf` interface. It cares about making Git functionality
available, which is on a much higher conceptual level.

Furthermore, to the best of my knowledge the intent wasn't ever to
provide the ability to take a couple of C files and have them be easy to
build as part of another project. The goal is to have proper interfaces
to our code, but whether the code itself has internal interdependencies
is not really much of a concern (globals are a different story).

My goal is different: I want to `cp -r` the "reftable/" directory into
libgit2 or any other project that wants to implement a reftable backend
without having to care about all the different dependencies that it may
pull in or having to carefully massage all of the includes.

Now if it was a single standalone file that one would have to copy in
addition to that I'd be fine. But "strbuf.c' isn't that at all: it ropes
in "gettext.h", "hex-ll.h", "string-list.h", "utf-8.h", "date.h". All of
which is not needed at all, but the person who does the porting now has
to carefully figure out what can and cannot be removed from both
"strbuf.h" and "strbuf.c".

So... it gets out of hand really fast. We would have to significantly
change the way we develop the Git codebase if we wanted to make this a
good experience. But enforcing very strict coding guidelines onto all of
our codebase doesn't feel reasonable to me, which is why I am instead
trying to reinstate the state where the reftable library was decoupled
from the rest of the Git codebase.

The difference here is roughly 100 to 200 lines of code, which I don't
think is much of a maintenance burden by itself. In fact, we'll even
start to share the maintenance burden with libgit2, because they would
be able to use the exact same copy as we do and thus contribute back
bugfixes and improvements for discoveries that their additional test
coverage brings us.

I hope that this all clarifies my point of view :) I've also Cc'd
Han-Wen in case I've made any mistakes regarding the original intent.

Patrick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help