Thread (16 messages) 16 messages, 4 authors, 2025-04-06

Re: [GSoC] git-refs proposal draft

From: Patrick Steinhardt <hidden>
Date: 2025-03-31 09:42:46

On Sat, Mar 29, 2025 at 11:02:46PM +0800, Zheng Yuting wrote:
## Name and Contact Information

- Full Name: Zheng Yuting
- Email Address: 05ZYT30@gmail.com
- Time Zone: UTC +8:00

---

## Abstract

The current Git reference management functionality is fragmented across
multiple independent commands (git-show-ref, git-for-each-ref,
git-update-ref, git-pack-refs, git-check-ref-format, and
git-symbolic-ref), leading to code redundancy and increased maintenance
costs. Based on Patrick Steinhardt’s integration vision[1], this project
aims to introduce 8 new subcommands (list, exists, show, resolve, pack,
update, delete, check-format) under the existing git-refs command to
achieve the following objectives:
I have a couple of opinions on the exact naming of the subcommands, more
on that below.

In any case, I don't think the naming and how exactly each of these
commands should look and work like needs to be hashed out in this
document. It's nice to scope out _what_ we want to achieve and propose
how this could look like, but ultimately I think that most of the design
should happen during the project itself.
- Feature Integration: Consolidate existing reference management
  commands under git-refs, while maintaining backward compatibility.
- Feature Enhancement: Introduce recursion depth control for git-refs
  resolve.
- Testing & Documentation: Add test cases ensuring consistency and
  update relevant documentation.

---

## Implementation Plan

### Command Integration Strategy

#### Design Goals

The project will unify scattered reference management functionalities
under the git-refs subcommand framework, ensuring:

1. Complete Feature Coverage: Each subcommand fully replaces its
   corresponding legacy command.
2. Parameter Compatibility: Preserve the semantics and output behavior
   of legacy command options.
This one is something that is up for debate. While I do expect that most
of the commands should remain current semantics and options, we could
also use this as an opportunity to think whether there are any issues
with the current design and improve upon it.
3. Code Reusability: Minimize redundancy by sharing underlying modules
   (e.g., refs/files-backend.c).

#### Subcommand Mapping

- git-refs list
  Replaces git-show-ref and git-for-each-ref, merging reference listing
  functionalities with support for formatting (--format), filtering
  (--heads, --tags), and sorting (--sort).
Yup. One thing to note is that git-show-ref(1) and git-for-each-ref(1)
are very similar, but not quite the same. One should find good arguments
which of the two semantics are preferable to us and why that is.

For example, git-show-ref(1) outperforms git-for-each-ref(1) due to the
default format:

    Benchmark 1: git show-ref
      Time (mean ± σ):      99.0 ms ±   0.5 ms    [User: 55.6 ms, System: 43.0 ms]
      Range (min … max):    98.0 ms … 100.8 ms    100 runs

    Benchmark 2: git for-each-ref
      Time (mean ± σ):     134.0 ms ±   0.6 ms    [User: 82.3 ms, System: 50.8 ms]
      Range (min … max):   132.7 ms … 135.8 ms    100 runs

    Summary
      git show-ref ran
        1.35 ± 0.01 times faster than git for-each-ref
- git-refs exists
  Replaces git-show-ref --exists, providing reference existence checks
  with positive (<ref>) and exclusion-based (--exclude-existing)
  verification.
I'm not quite clear what exclusion-based existence checks is. How do you
check whether something exists when you exclude it? I don't think that
this option is relevant in the context of `git refs exists`.
- git-refs show
  Replaces git-show-ref --verify, validating reference correctness with
  a strict mode (--strict).
Yup. In contrast to `git refs resolve` this command shouldn't resolve
the ref, but directly show what it's pointing to. And this should be
true for both symbolic and normal refs.
- git-refs resolve
  Replaces git-symbolic-ref, resolving symbolic references with added
  recursion depth control (--max-depth), while retaining deletion (-d)
  and quiet mode (-q) options.
Not quite. The difference to `git refs show` is that this command always
resolves the ref to an object. So it's rather more similar to `git
rev-parse --verify`, except that it only ever handles references.
- git-refs pack
  Replaces git-pack-refs, packing loose references with support for
  filtering (--include, --exclude) and automatic cleanup (--prune).
I would probably call this `git refs optimize` or something like that.
git-pack-refs(1) is mostly called this way because it was introduced to
pack refs into the "packed-refs" file. But nowadays with the reftable
backend I think that the command name is somewhat inaccurate.
- git-refs update
  Replaces git-update-ref, providing transactional reference updates
  with batch processing (--stdin) and atomic guarantees.
- git-refs delete
  Separates the delete functionality from git-update-ref, ensuring
  explicit handling of reference removals with safety checks and batch
  operations (--stdin).
It's up for debate whether we should even have something like `git refs
delete`. As you rightfully notice `git refs update` already handles the
usecase, so it feels like needless duplication.
- git-refs check-format
  Replaces git-check-ref-format, validating reference format with
  support for normalized output (--normalize).
Ah, nice, this is a command I forgot about.
#### Implementation Strategy

1. Option Parsing: Each subcommand will reuse the argument parsing
   logic from legacy commands (e.g., git-pack-refs --prune).
We cannot and do not want to do this for every case. As mentioned above,
we may want to iterate on some of the subcommands to address historic
warts. But overall I agree, we should of course aim to reduce
duplication as far as it is sensible to do.
2. Shared Backend Logic: Calls to common functions in refs/ (e.g.,
   reference traversal, locking mechanisms).
3. Error Consistency: Maintain the same error codes and message
   formats as legacy commands.
Same reasoning here, we may want to adapt some of them. The old commands
won't go away as they are used everywhere, and that makes it more
reasonable for us to change behaviour in their newer equivalents.
---

### Example: Implementing git-refs pack

#### Functional Implementation

1. Modify builtin/refs.c:
   - Add cmd_refs_pack function implementing git-pack-refs logic.
   - Update cmd_refs to include pack with
     OPT_SUBCOMMAND("pack", &fn, cmd_refs_pack).
   - Define REFS_PACK_USAGE:
     git refs pack [--all] [--no-prune] [--auto] [--include <pattern>]
     [--exclude <pattern>].
2. Register New Subcommand in git.c:
   - Add { "refs-pack", cmd_refs_pack }, to the command array.
You don't actually have to change "git.c" to introduce new subcommands.
We don't want `git refs-pack`, but rather `git refs pack`, which is an
important distinction.
3. Reuse refs/files-backend.c Logic:
   - Ensure cmd_refs_pack calls pack_refs correctly, adjusting as
     necessary for new options.
We shouldn't have to touch any of the backends at all. You should rather
make sure to integrate with "refs.c", which wraps the backends and
provides a backend-agnostic interface to refs.
#### Testing Plan

- Test Cases:
  Add t/txxx-refs-pack.sh, leveraging t/t0601-reffiles-pack-refs.sh
  scenarios to verify:
  - --prune removes obsolete references correctly.
  - --include and --exclude apply filtering as expected.
  - Packed references match legacy command outputs (diff .git/packed-refs).
- Performance Benchmarking (if needed):
  Add performance tests in t/perf to ensure no significant regression
  in execution time or memory usage.

#### Documentation Updates

- User Manual:
  Add a pack section to Documentation/git-refs.txt, mapping options to
  legacy command equivalents.
- Developer Notes:
  Comment code to highlight functional parity between git-refs pack
  and git-pack-refs.

---

### Timeline

- May 8 - May 11 (4 days): Initial Testing & Subcommand Framework Setup
- May 12 - May 28 (17 days): pack Subcommand Implementation
- May 29 - June 14 (17 days): check-format Subcommand Development
- June 15 - July 5 (21 days): update and delete Subcommands Development
- July 6 - July 26 (21 days): show and exists Subcommands Development
- July 27 - August 16 (21 days): resolve Subcommand Implementation
- August 17 - September 6 (21 days): list Subcommand Implementation
- September 7 - September 16 (10 days): Mid-term Review
- September 17 - September 23 (7 days): Mentor Review & Final Adjustments
You probably underestimate the time to review and land a specific change
quite significantly. Landing new features in ~2 weeks is thus not quite
realistic and you should allocate a lot more time for each of the
specific subcommands.

That of course raises the question of how to squeeze all of the
subcommands into a single GSoC. And the answer is that you don't: it's
perfectly fine to implement only a subset of the new proposed
subcommands. I'd rather you spend more time thinking about how to
improve upon the status quo for each of the subcommands and thus spend
more time on it than trying to do everything in a hurry.

So: there isn't any expectation that you manage to implement all of
them. I'd recommend to pick a subset of commands that you want to
implement as a realistic goal. You may define other commands as a
stretch goal in case you manage to speed through the implementation way
faster than I anticipate.

Thanks!

Patrick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help