Re: [PATCH] ref-filter: format iteratively with lexicographic refname sorting

From: Patrick Steinhardt <hidden>
Date: 2024-10-17 04:29:04

On Wed, Oct 16, 2024 at 10:48:28PM -0400, Jeff King wrote:

On Wed, Oct 16, 2024 at 06:11:47PM -0400, Taylor Blau wrote:

quoted

On Wed, Oct 16, 2024 at 08:00:30AM +0200, Patrick Steinhardt wrote:

quoted

But there is one exception here where we _can_ get away with sorting
refs while streaming: ref backends sort references returned by their
iterators in lexicographic order. So if the following conditions are all
true we can do iterative streaming:

  - The caller uses at most a single name pattern. Otherwise we'd have
    to sort results from multiple invocations of the iterator.

  - There must be at most a single sorting specification, as otherwise
    we're not using plain lexicographic ordering.

  - The sorting specification must use the "refname".

  - The sorting specification must not be using any flags, like
    case-insensitive sorting.

Perhaps a niche case, but what about ancient packed-refs files that were
written before the 'sorted' capability was introduced?

We should be OK there. In that case we actually read in and sort the
packed-refs entries ourselves. We have to, since we do an in-order merge
with the loose refs while iterating.

I do think this optimization is worth doing, and not a problem with our
current backends. The biggest worries would be:

  1. Some new ref backend that doesn't return sorted results. I find
     this unlikely, and anyway it's easily caught by having coverage in
     the test suite (which I assume we already have, but I didn't look).

My assumption is that a ref iterator that _isn't_ sorted would lead to
undesirable behaviour. I'd be surprised if git-show-ref(1) started to
show refs in a random order. So we have essentially baked the
requirement that ref iterators return refs in lexicographic order into
our codebase already.

  2. Some new flag combination that requires disabling the optimization,
     and which must be dealt with in the code. This seems unlikely to me
     but not impossible. I think enabling the optimization is worth it,
     though.

And this part is an issue with or without my patch. The logic we have
in the ref-filter API is quite fragile, and everybody who wants to add
some new flags must remember to update `can_do_iterative_format()`. I'm
not really a huge fan of that, but on the other hand the subsystem does
not change all that frequently anyway.

quoted

    to sort results from multiple invocations of the iterator.

I think this part is erring on the cautious side, as we can often
collapse these into a single iteration due to the ref-prefix work. It
may be OK to keep using the slower code here if multiple patterns aren't
commonly used, but I'd suspect that:

  git for-each-ref --sort=refname refs/heads refs/tags

could benefit.

Mh. So we do end up using `refs_for_each_fullref_in_prefixes()`, which
may or may not end up collapsing the prefixes. We do sort and dedup the
prefixes via `find_longest_prefixes()`, so we don't have to worry about
e.g. `git for-each-ref refs/tags refs/heads refs/tags`.

So... it should be fine to also use this with multiple patterns? None of
our tests fail, either, which reassures me a bit.

I'll send a v2 that loosens this requirement.

Thanks for your input!

Patrick

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help