Re: [PATCH] fsck: do not loop infinitely when processing packs
From: Patrick Steinhardt <hidden>
Date: 2026-02-23 09:53:17
On Mon, Feb 23, 2026 at 09:43:41AM +0100, Patrick Steinhardt wrote:
On Sun, Feb 22, 2026 at 06:37:10PM +0000, brian m. carlson wrote:quoted
When we iterate over our packfiles in the fsck code, we do so twice. The first time, we count the number of objects in all of the packs together and later on, we iterate a second time, processing each pack and verifying its integrity. This would normally work fine, but if we have two packs and we're processing the second, the verification process will open the pack to read from it, which will place it at the beginning of the most recently used list. Since this same list is used for iteration, the pack we most recently processed before this will then be behind the current pack in the linked list, so when we next process the list, we will go back to the first pack again and then loop forever. This also makes our progress indicator loop up to many thousands of percent, which is not only nonsensical, but a clear indication that something has gone wrong. Solve this by skipping our MRU updates when we're iterating over packfiles, which avoids the reordering that causes problems.Right, this makes sense. We know that we cannot modify the list of packs in case we're iterating through them, so `repo_for_each_pack()` should indeed skip the MRU updates.quoted
Signed-off-by: brian m. carlson <redacted> --- I realize that t1050 may seem like a bizarre place to put this test. However, I was debugging my sha256-interop branch and why the final test calling `git fsck` was failing, so I placed a `git fsck` earlier in the test to double-check and discovered the problem. Since we already have a natural testcase here, I thought I'd just place the test where we already know it will trigger the problem.Makes me wonder though why none of the tests t1450-fsck exhibit this pattern. I cannot imagine that there is no test there that doesn't have multiple packs. *goes checking* We actually might not, but when trying to come up with a minimum reproducer I failed at first. This is because ultimately the root cause seems to be a bit more complex: we don't only care about there being multiple packfiles. We also care about "core.bigFileThreshold". Typically, we don't execute `find_pack_entry()` at all when verifying packfiles as we iterate through objects in packfile order. We thus don't have to look up objects via their object ID, but instead we do so by using their packfile offset. And this mechanism will not end up in `find_pack_entry()`, and thus we wouldn't update the MRU. But there's an exception: when the size of the object that is to be checked exceeds "core.bigFileThreshold" we won't read it directly, but we'll instead use `stream_object_signature()`, which eventually ends up calling `odb_read_stream_open()`. And that of course _will_ call `find_pack_entry()`, as we're now in the mode where we search by object ID, not by offset. And consequently, we'll update the MRU in this call path. With that knowledge it's kind of easy to reproduce the issue: we simply need two packfiles, and each of them must contain at least one blob that is larger than "core.bigFileThreshold". Now I agree that the below proposed fix would be a good change to make the code more solid while we still have `repo_for_each_pack()` (I plan to eventually get rid of it). But arguably, the above logic is kind of broken regardless of this: we are asked to verify objects in the current pack, but we may end up verifying the object via a different pack. So if the same object were to exist in multiple packs, we might end up only verifying one of its instances. I've got a couple patches in the making that'll fix this.
I've sent out the patches via [1]. Thanks! Patrick [1]: [ref]