Re: [PATCH] ref-filter: format iteratively with lexicographic refname sorting
From: Patrick Steinhardt <hidden>
Date: 2024-10-17 04:29:04
On Wed, Oct 16, 2024 at 10:48:28PM -0400, Jeff King wrote:
On Wed, Oct 16, 2024 at 06:11:47PM -0400, Taylor Blau wrote:quoted
On Wed, Oct 16, 2024 at 08:00:30AM +0200, Patrick Steinhardt wrote:quoted
But there is one exception here where we _can_ get away with sorting refs while streaming: ref backends sort references returned by their iterators in lexicographic order. So if the following conditions are all true we can do iterative streaming: - The caller uses at most a single name pattern. Otherwise we'd have to sort results from multiple invocations of the iterator. - There must be at most a single sorting specification, as otherwise we're not using plain lexicographic ordering. - The sorting specification must use the "refname". - The sorting specification must not be using any flags, like case-insensitive sorting.Perhaps a niche case, but what about ancient packed-refs files that were written before the 'sorted' capability was introduced?We should be OK there. In that case we actually read in and sort the packed-refs entries ourselves. We have to, since we do an in-order merge with the loose refs while iterating. I do think this optimization is worth doing, and not a problem with our current backends. The biggest worries would be: 1. Some new ref backend that doesn't return sorted results. I find this unlikely, and anyway it's easily caught by having coverage in the test suite (which I assume we already have, but I didn't look).
My assumption is that a ref iterator that _isn't_ sorted would lead to undesirable behaviour. I'd be surprised if git-show-ref(1) started to show refs in a random order. So we have essentially baked the requirement that ref iterators return refs in lexicographic order into our codebase already.
2. Some new flag combination that requires disabling the optimization,
and which must be dealt with in the code. This seems unlikely to me
but not impossible. I think enabling the optimization is worth it,
though.And this part is an issue with or without my patch. The logic we have in the ref-filter API is quite fragile, and everybody who wants to add some new flags must remember to update `can_do_iterative_format()`. I'm not really a huge fan of that, but on the other hand the subsystem does not change all that frequently anyway.
quoted
quoted
to sort results from multiple invocations of the iterator.I think this part is erring on the cautious side, as we can often collapse these into a single iteration due to the ref-prefix work. It may be OK to keep using the slower code here if multiple patterns aren't commonly used, but I'd suspect that: git for-each-ref --sort=refname refs/heads refs/tags could benefit.
Mh. So we do end up using `refs_for_each_fullref_in_prefixes()`, which may or may not end up collapsing the prefixes. We do sort and dedup the prefixes via `find_longest_prefixes()`, so we don't have to worry about e.g. `git for-each-ref refs/tags refs/heads refs/tags`. So... it should be fine to also use this with multiple patterns? None of our tests fail, either, which reassures me a bit. I'll send a v2 that loosens this requirement. Thanks for your input! Patrick