Re: Question: What's the best way to implement directory permission control in git?
From: ZheNing Hu <hidden>
Date: 2022-07-29 14:22:15
Elijah Newren [off-list ref] 于2022年7月29日周五 09:48写道:
quoted
But due to git's commits referring to a Merkle tree I can tell you that a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving you any of that content. You *could* then manually construct a commit like: tree <NEW_TREE> ... Where the "<NEW_TREE>" would be a tree like: 100644 blob <NEW-BLOB-SHA1> UPDATED.md 040000 tree <XYZ> secret-stuff And send you a PACK with my new two three new objects (commit, blob & new top-level NEW_TREE). To the remote end & protocol it wouldn't be distinguishable from a "normal" push. But nothing supports this already, as a practical matter most of git either hard dies if content is missing, or has other odd edge-case semantics (and I'm not up-to-date on the state of the art).Actually, this is what sparse-index (as a sub-option in sparse-checkout) already basically does. See Documentation/technical/sparse-index.txt for details, and note that we're basically in Phase IV of that document. In short, the sparse-index makes it so that common operations based on the index do not need and do not use information about some subtrees, so if someone has a partial clone starting with no blobs, they will only have to download a small subset of the repository blobs in order to handle most Git operations, and many operations become much faster since the index is so much smaller.
I think this is mainly due to sparse-checkout instead of sparse-index. Without the sparse-index, we also can do git add, git commit without fetching other blob objects. But sparse-index can help reduce the size of indexes.
However: * Users can run `git sparse-checkout reapply --no-sparse-index` at any time to force the index to be full again. This is documented, and even suggested that users remember in case they attempt to use external tools (jgit? libgit2? others?) that don't understand sparse directory entries. So, removing this ability would be problematic.
Or `git sparse-checkout disable`? Whatever, when git finds other objects missing, it will fetch the objects from remote, and we may do ACL check here. Just let jgit/libgit2/others fail to fetch objects (in this special case?)
* It makes no guarantee whatsoever that the sparse directory entries are not expanded by less frequently used Git commands. Notice the "ensure_full_index()" calls sprinkled throughout the code. Some have been removed, one by one, as commands have been modified to better operate with a sparse index. The odds they'll all be removed in the future may well be close to 0%.
That's good...
* The `ort` merge strategy ignores the index altogether during operation. If it needs to walk into a tree to complete a merge/rebase/revert/cherry-pick/etc., it will. Further, it doesn't just look into those paths, it intentionally de-sparsifies paths involved in conflicts, so it can display it to the user.
So the user has to care and deal with a merge conflict in a directory that he "doesn't have access to"... It would be nice to have the user only care about conflicts in directories/files to which he has permissions. I don't know if it would be very difficult to design.
* Just because the index is sparse does not mean other commands can't walk into those directories. So `git grep` (when given a revision), `git diff`, `git log`, etc. will look in (old versions of) those paths.
Agree.
quoted
Anyway, just saying that for the longer term I'm not aware of an *intrinsic* reason for why we couldn't support this sort of thing, in case anyone's interested in putting in a *lot* of leg work to make it happen.And on top of the technical leg work required, they would also need to somehow convince everyone else that it's worth accepting the increased maintenance effort. Right now, even if someone had already done the work to implement it, I'd say it's not worth the maintenance costs. However, there are two alternative choices I can think of here: You can use submodules if you want a fixed part of the repository to only be available to a subset of folks, or use josh (https://github.com/josh-project/josh) if you need it to be more dynamic.
Thanks, I will take a look. ZheNing Hu