Re: [PATCH 2/8] packfile: move the MRU list into the packfile store
From: Patrick Steinhardt <hidden>
Date: 2025-10-30 08:59:25
On Wed, Oct 29, 2025 at 06:39:32PM -0400, Taylor Blau wrote:
On Tue, Oct 28, 2025 at 12:08:32PM +0100, Patrick Steinhardt wrote:quoted
Packfiles have two lists associated to them: - A list that keeps track of packfiles in the order that they were added to a packfile store. - A list that keeps track of packfiles in most-recently-used order so that packfiles that are more likely to contain a specific object are ordered towards the front. Both of these lists are hosted by `struct packed_git` itself, So to identify all packfiles in a repository you simply need to grab the first packfile and then iterate the `->next` pointers or the MRU list. This pattern has the problem that all packfiles are part of the same list, regardless of whether or not they belong to the same object source. With the upcoming pluggable object database effort this needs to change: packfiles should be contained by a single object source, and reading an object from any such packfile should use that source to look up the object. Consequently, we need to break up the global lists of packfiless/lists/list/
It's actually two: the MRU-ordered and the mtime-ordered one.
quoted
into per-object-source lists.How does this work for alternates? My understanding is that each alternate now has its own object source. So to perform an object lookup in a repository with alternate(s), I am assuming that at some layer we need to iterate over those sources to then enumerate the packs in that source looking for some object. I would have imagined that packfile.c::find_pack_entry() would have to be adjusted in a similar way as above, but I couldn't find the changes in this series, so I feel like I must be missing something in my understanding of how this all works together :-). Are packs from different sources still connected somehow such that iterating over the list of packs from one source will enumerate the list of packs from all sources?
This whole mechanism isn't yet part of this series :) So right now we don't really change anything yet, and the list of packfiles is still a global list across all alternates. This series is thus basically only paving the path towards having per-alternate packfile stores. Moving the packfile store into the sources is going to be part of the next series.
quoted
A first step towards this goal is to move those lists ouf of `structs/ouf/out/quoted
packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series.Before sending, I was confused by "Consequently, we need to break up the global lists of packfiles [...]", since it wasn't clear whether or not this series realizes that goal, or pushes us in the direction towards it. But this clarifies things, and I think is the reason that we do not see more invasive changes like needing to enumerate the MRU cache of each store in order to find an object like I mentioned above.
Yup, exactly.
quoted
Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches.I am a little curious why we need a new list type and implementation here. Is it to avoid exposing the list as part of struct packed_git like we are forced to do with list_head? I could imagine that you might want to avoid exposing the "struct list_head mru" part of packed_git to avoid the suggestion that all packfiles (including those from different sources) are part of the same list. But if that's the case, I wonder if we couldn't have kept the same mru list and clarified via comment that it is per-store, not global. I suppose that is a bit of a foot-gun, and perhaps that is what you are trying to do here, but after reading the patch message a few times I wasn't clear on what the motivation for the new type was.
Yes, this is one of the reasons, it very much feels like a foot-gun to me. I found it significantly harder to work with the list embedded into the packfiles themselves, and it made it significantly harder to check whether the split really is done correctly. So by moving the list into the packfile store it now becomes obvious in our code's layout, and it becomes much easier to build an implicitly-correct mental model. The second reason though is that packfiles aren't only used in the context of our ODB, but also by other layers like our transport. I want to have a clean split so that a packfile is a completely separate entity that can exist without an object store. But if the list pointers used by the store are embedded into the packfile itself, then that boundary gets a lot more fuzzy. So it's basically separation of concerns: `struct packed_git` should only ever be concerned about a singular packfile. And the packfile store is then concerned with managing a set of packfiles. Thanks! Patrick