Thread (4 messages) 4 messages, 2 authors, 2025-10-02

Re: [PATCH 0/6] odb: track commit graphs via object source

From: Patrick Steinhardt <hidden>
Date: 2025-10-02 11:36:01

On Thu, Oct 02, 2025 at 01:21:34PM +0200, Patrick Steinhardt wrote:
On Thu, Sep 25, 2025 at 12:17:50PM -0700, Junio C Hamano wrote:
quoted
Patrick Steinhardt [off-list ref] writes:
quoted
There is no inherent reason why a new backend would not be able to use
the existing commit-graph infrastructure indeed. But there are reasons
that specific backends may not want to do so. If objects are already
stored in a database table, then it may make way more sense to store
additional metadata that is currently stored in the commit-graph in a
secondary database table instead of in the commit graph.
...
This is roughly what I have in my head right now. And I realize that
this information really should be sitting in a design document. I'm
working on that, but still need to land two more patch series before I
want to send such a patch series to the list.
So is everybody happy with this line of thought that makes it
mandatory for each backend to decide and implement the commit-graph
support if they want to?

My reading of the later part of Taylor's message[*] tells me that at
least Taylor does not agree with that position, and I am not sure
about this design choice, either.  Surely, each backend can have its
own optimization, but looking at the way data from the commit-graph
and other auxiliary data files are used to optimize real operations
(like populating the essential fields of the commit object first
from the graph, only to read other things lazily from the object
database, or switching to completely different traversal machinery
when reachability bitmap is available), we cannot say that each
backend can store whatever side data they please and leave it at
that.  The code paths that are supposed to be generic need to be
aware of these side data used for optimization to some degree, so
conceptually it is much cleaner (well, at least to my eyes, that is)
to declare that the auxiliary data files like commit-graph and
reachability bitmaps are defined on the objects in the repository,
no matter what backend is used to store them.
My intent here is mostly to allow us to swap out how exactly the data is
being cached. During the Git Merge I heard from some JJ developer (I
think) that they also have a pluggable cache, but they approach the
issue differently: instead of making the cache a property of the object
backend, they instead make the cache itself pluggable.

I think that's a worthwhile angle to explore. The cache would still sit
on the repository level, and it wouldn't have to care at all whether we
use loose objects/packfiles or any other backend. But in theory, we can
still swap it out for a different representation as desired.

Which overall means that we can defer this to a later point in time, as
we can make it pluggable independent from making the object database
itself pluggable.

So I'd propose to merge the first six patches, as everyone seemed to
Correction: first five patches, of course :)

Patrick
agree that they improve the status quo, but drop the last patch that
moves the commit-graph into the ODB sources.

Does that seem reasonable to everyone? If so, I don't really see a
reason to reroll at this point. But please let me know in case I miss
anything that needs addressing.

Thanks!

Patrick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help