Missing and omitted objects
From: Simon Richter <hidden>
Date: 2026-01-21 12:02:09
Hi,
we're having a bit of a discussion in Debian.
The goal is to move towards git based storage for source packages, away
from tarballs; ideally we'd like to reuse the upstream git archive as
far as possible, so it is easy to check for differences.
However, some projects are shipping files that aren't redistributable,
or that we want to omit for other reasons (such as vendored
dependencies, when there is a perfectly working common version
available, and we really really want to make sure these don't get used
accidentally).
The goal here is to allow the recipient of such a bundle to verify that
any files received are unmodified, and get a list of paths that were
removed (which may be an entire subdirectory). Ideally, they could also
continue working on a clone of this and generate commits on top as long
as the affected paths aren't touched.
The minimal amount of data we'd want to archive is a single commit and
its tree and dependencies, plus optionally a signed tag pointing at it
if it exists (i.e. the same information we get if we use git-archive,
plus the signature on the tag, plus the option to clone from such a
snapshot). For the simple case where nothing is removed, this already
works well and covers most of the use cases, but, sadly, not all of them.
As a side effect, this could make recovery of a broken repository that
is missing objects more robust.
Right now, I'd like some feedback whether someone has a better idea, and
if such a feature could ever work or if it violates some fundamental
design principles.
Simon