Re: [PATCH 2/8] packfile: move the MRU list into the packfile store
From: Taylor Blau <hidden>
Date: 2025-10-29 22:39:35
On Tue, Oct 28, 2025 at 12:08:32PM +0100, Patrick Steinhardt wrote:
Packfiles have two lists associated to them:
- A list that keeps track of packfiles in the order that they were
added to a packfile store.
- A list that keeps track of packfiles in most-recently-used order so
that packfiles that are more likely to contain a specific object are
ordered towards the front.
Both of these lists are hosted by `struct packed_git` itself, So to
identify all packfiles in a repository you simply need to grab the first
packfile and then iterate the `->next` pointers or the MRU list. This
pattern has the problem that all packfiles are part of the same list,
regardless of whether or not they belong to the same object source.
With the upcoming pluggable object database effort this needs to change:
packfiles should be contained by a single object source, and reading an
object from any such packfile should use that source to look up the
object. Consequently, we need to break up the global lists of packfiless/lists/list/
into per-object-source lists.
How does this work for alternates? My understanding is that each alternate now has its own object source. So to perform an object lookup in a repository with alternate(s), I am assuming that at some layer we need to iterate over those sources to then enumerate the packs in that source looking for some object. I would have imagined that packfile.c::find_pack_entry() would have to be adjusted in a similar way as above, but I couldn't find the changes in this series, so I feel like I must be missing something in my understanding of how this all works together :-). Are packs from different sources still connected somehow such that iterating over the list of packs from one source will enumerate the list of packs from all sources?
A first step towards this goal is to move those lists ouf of `struct
s/ouf/out/
packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series.
Before sending, I was confused by "Consequently, we need to break up the global lists of packfiles [...]", since it wasn't clear whether or not this series realizes that goal, or pushes us in the direction towards it. But this clarifies things, and I think is the reason that we do not see more invasive changes like needing to enumerate the MRU cache of each store in order to find an object like I mentioned above.
Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches.
I am a little curious why we need a new list type and implementation here. Is it to avoid exposing the list as part of struct packed_git like we are forced to do with list_head? I could imagine that you might want to avoid exposing the "struct list_head mru" part of packed_git to avoid the suggestion that all packfiles (including those from different sources) are part of the same list. But if that's the case, I wonder if we couldn't have kept the same mru list and clarified via comment that it is per-store, not global. I suppose that is a bit of a foot-gun, and perhaps that is what you are trying to do here, but after reading the patch message a few times I wasn't clear on what the motivation for the new type was. Thanks, Taylor