Re: [PATCH] doc: add a explanation of Git's data model

From: Ben Knoble <hidden>
Date: 2025-10-10 00:42:55

Le 9 oct. 2025 à 10:21, Julia Evans [off-list ref] a écrit :

I collected some feedback from Git users on this v2 document. I'm expecting more
feedback, but here's an initial brain dump of my notes. I mostly wrote this for
my own use but I thought it might be interesting to other folks too.

[snip]

references:

- Two people pointed out that because references are often stored as files,
 you can't have two references named `julia/ticket-number` and
 `julia/ticket-number/task-name`.
 I'm not sure if this is a fundamental limit of the refs data model
 (does the reftable backend have the same limitation?), but it could be
 a good reason to mention that refs are often stored as files, because
 it makes it obvious that you can't have a file and a directory with
 the same name.
 Obviously this is an issue that is affecting people relatively often
 in practice though so I think it's worth mentioning in some way.

I don’t think the reftable backend has this limitation (?), but it reminded me of another important one: on case-insensitive filesystems you cannot have both « julia » and « JULIA » branches!

This occasionally creates problems where someone cannot fetch/clone what has been pushed.

Anyway: it’s worth mentioning the files for that purpose. It would be nice to improve the UI as you describe below to continue to be able to naturally interrogate Git without needing to know about all the storage formats (recall that cat-file works just fine with packs and MIDXs!).

Overall: several people suggested mentioning more about where things
are stored in the `.git` directory, which I just removed.

I think I want to avoid this (not sure yet), but I'm going to think
about the underlying motivation for this suggestion and see if it can be
addressed in a different way.

Some ideas for what functions discussing the `.git` directory has:

1. Like I mentioned above with branches, sometimes the implementation causes
  some extra constraints like "you can't have branches `julia/ticket`
  and `julia/ticket/task`". So often people like to know a little
  about the implementation because it can help predict some of the
  holes in the abstractions you're using.
2. It lets you view the "raw" data, so you can be totally sure about
  what Git is storing. This is nice because Git's UI can be very
  inconsistent sometimes, so looking at the raw data gives a sense of
  certainty about what's actually there.

I tried to put together a list of ways to look at the "raw" data without
looking in the `.git` directory. The ways for objects and the index are great,
but for references and the reflog they involve these pretty complex format
strings, I'm not confident I've gotten the format strings right and IMO
they don't inspire a lot of confidence.

View an object with:
----
git cat-file -p <object-id>
----

View a reference with:

----
git for-each-ref <ref-name> --include-root-refs  --format="%(refname) %(if)%(symref)%(then)%(symref)%(else)%(objectname:short)%(end)"
----

View the index with:

----
git ls-files --stage
----

View the reflog for a reference with:

----
git reflog show <refname> --format="%h | %gd | %gn <%ge> | %gs" --date=iso
----

[kept for context]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help