Thread (34 messages) 34 messages, 3 authors, 2025-10-30

Re: [PATCH 2/8] packfile: move the MRU list into the packfile store

From: Taylor Blau <hidden>
Date: 2025-10-29 22:39:35

On Tue, Oct 28, 2025 at 12:08:32PM +0100, Patrick Steinhardt wrote:
Packfiles have two lists associated to them:

  - A list that keeps track of packfiles in the order that they were
    added to a packfile store.

  - A list that keeps track of packfiles in most-recently-used order so
    that packfiles that are more likely to contain a specific object are
    ordered towards the front.

Both of these lists are hosted by `struct packed_git` itself, So to
identify all packfiles in a repository you simply need to grab the first
packfile and then iterate the `->next` pointers or the MRU list. This
pattern has the problem that all packfiles are part of the same list,
regardless of whether or not they belong to the same object source.

With the upcoming pluggable object database effort this needs to change:
packfiles should be contained by a single object source, and reading an
object from any such packfile should use that source to look up the
object. Consequently, we need to break up the global lists of packfiles
s/lists/list/
into per-object-source lists.
How does this work for alternates? My understanding is that each
alternate now has its own object source. So to perform an object lookup
in a repository with alternate(s), I am assuming that at some layer we
need to iterate over those sources to then enumerate the packs in that
source looking for some object.

I would have imagined that packfile.c::find_pack_entry() would have to
be adjusted in a similar way as above, but I couldn't find the changes
in this series, so I feel like I must be missing something in my
understanding of how this all works together :-).

Are packs from different sources still connected somehow such that
iterating over the list of packs from one source will enumerate the list
of packs from all sources?
A first step towards this goal is to move those lists ouf of `struct
s/ouf/out/
packed_git` and into the packfile store. While the packfile store is
currently sitting on the `struct object_database` level, the intent is
to push it down one level into the `struct odb_source` in a subsequent
patch series.
Before sending, I was confused by "Consequently, we need to break up the
global lists of packfiles [...]", since it wasn't clear whether or not
this series realizes that goal, or pushes us in the direction towards
it.

But this clarifies things, and I think is the reason that we do not see
more invasive changes like needing to enumerate the MRU cache of each
store in order to find an object like I mentioned above.
Introduce a new `struct packfile_list` that is used to manage lists of
packfiles and use it to store the list of most-recently-used packfiles
in `struct packfile_store`. For now, the new list type is only used in a
single spot, but we'll expand its usage in subsequent patches.
I am a little curious why we need a new list type and implementation
here. Is it to avoid exposing the list as part of struct packed_git like
we are forced to do with list_head?

I could imagine that you might want to avoid exposing the "struct
list_head mru" part of packed_git to avoid the suggestion that all
packfiles (including those from different sources) are part of the same
list. But if that's the case, I wonder if we couldn't have kept the same
mru list and clarified via comment that it is per-store, not global.

I suppose that is a bit of a foot-gun, and perhaps that is what you are
trying to do here, but after reading the patch message a few times I
wasn't clear on what the motivation for the new type was.

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help