Re: [PATCH] doc: add a explanation of Git's data model
From: Ben Knoble <hidden>
Date: 2025-10-10 00:42:55
Le 9 oct. 2025 à 10:21, Julia Evans [off-list ref] a écrit : I collected some feedback from Git users on this v2 document. I'm expecting more feedback, but here's an initial brain dump of my notes. I mostly wrote this for my own use but I thought it might be interesting to other folks too.
[snip]
references: - Two people pointed out that because references are often stored as files, you can't have two references named `julia/ticket-number` and `julia/ticket-number/task-name`. I'm not sure if this is a fundamental limit of the refs data model (does the reftable backend have the same limitation?), but it could be a good reason to mention that refs are often stored as files, because it makes it obvious that you can't have a file and a directory with the same name. Obviously this is an issue that is affecting people relatively often in practice though so I think it's worth mentioning in some way.
I don’t think the reftable backend has this limitation (?), but it reminded me of another important one: on case-insensitive filesystems you cannot have both « julia » and « JULIA » branches! This occasionally creates problems where someone cannot fetch/clone what has been pushed. Anyway: it’s worth mentioning the files for that purpose. It would be nice to improve the UI as you describe below to continue to be able to naturally interrogate Git without needing to know about all the storage formats (recall that cat-file works just fine with packs and MIDXs!).
Overall: several people suggested mentioning more about where things are stored in the `.git` directory, which I just removed. I think I want to avoid this (not sure yet), but I'm going to think about the underlying motivation for this suggestion and see if it can be addressed in a different way. Some ideas for what functions discussing the `.git` directory has: 1. Like I mentioned above with branches, sometimes the implementation causes some extra constraints like "you can't have branches `julia/ticket` and `julia/ticket/task`". So often people like to know a little about the implementation because it can help predict some of the holes in the abstractions you're using. 2. It lets you view the "raw" data, so you can be totally sure about what Git is storing. This is nice because Git's UI can be very inconsistent sometimes, so looking at the raw data gives a sense of certainty about what's actually there. I tried to put together a list of ways to look at the "raw" data without looking in the `.git` directory. The ways for objects and the index are great, but for references and the reflog they involve these pretty complex format strings, I'm not confident I've gotten the format strings right and IMO they don't inspire a lot of confidence. View an object with: ---- git cat-file -p <object-id> ---- View a reference with: ---- git for-each-ref <ref-name> --include-root-refs --format="%(refname) %(if)%(symref)%(then)%(symref)%(else)%(objectname:short)%(end)" ---- View the index with: ---- git ls-files --stage ---- View the reflog for a reference with: ---- git reflog show <refname> --format="%h | %gd | %gn <%ge> | %gs" --date=iso ----
[kept for context]