Thread (13 messages) 13 messages, 7 authors, 2022-08-01

Re: Question: What's the best way to implement directory permission control in git?

From: ZheNing Hu <hidden>
Date: 2022-07-29 14:22:15

Elijah Newren [off-list ref] 于2022年7月29日周五 09:48写道:
quoted
But due to git's commits referring to a Merkle tree I can tell you that
a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
you any of that content.

You *could* then manually construct a commit like:

        tree <NEW_TREE>
        ...

Where the "<NEW_TREE>" would be a tree like:

        100644 blob <NEW-BLOB-SHA1>     UPDATED.md
        040000 tree <XYZ>       secret-stuff

And send you a PACK with my new two three new objects (commit, blob &
new top-level NEW_TREE). To the remote end & protocol it wouldn't be
distinguishable from a "normal" push.

But nothing supports this already, as a practical matter most of git
either hard dies if content is missing, or has other odd edge-case
semantics (and I'm not up-to-date on the state of the art).
Actually, this is what sparse-index (as a sub-option in
sparse-checkout) already basically does.  See
Documentation/technical/sparse-index.txt for details, and note that
we're basically in Phase IV of that document.  In short, the
sparse-index makes it so that common operations based on the index do
not need and do not use information about some subtrees, so if someone
has a partial clone starting with no blobs, they will only have to
download a small subset of the repository blobs in order to handle
most Git operations, and many operations become much faster since the
index is so much smaller.
I think this is mainly due to sparse-checkout instead of sparse-index.
Without the sparse-index, we also can do git add, git commit without fetching
other blob objects.

But sparse-index can help reduce the size of indexes.
However:

* Users can run `git sparse-checkout reapply --no-sparse-index` at any
time to force the index to be full again.  This is documented, and
even suggested that users remember in case they attempt to use
external tools (jgit? libgit2? others?) that don't understand sparse
directory entries.  So, removing this ability would be problematic.
Or `git sparse-checkout disable`? Whatever, when git finds other objects
missing, it will fetch the objects from remote, and we may do ACL check here.
Just let jgit/libgit2/others fail to fetch objects (in this special case?)
* It makes no guarantee whatsoever that the sparse directory entries
are not expanded by less frequently used Git commands.  Notice the
"ensure_full_index()" calls sprinkled throughout the code.  Some have
been removed, one by one, as commands have been modified to better
operate with a sparse index.  The odds they'll all be removed in the
future may well be close to 0%.
That's good...
* The `ort` merge strategy ignores the index altogether during
operation.  If it needs to walk into a tree to complete a
merge/rebase/revert/cherry-pick/etc., it will.  Further, it doesn't
just look into those paths, it intentionally de-sparsifies paths
involved in conflicts, so it can display it to the user.
So the user has to care and deal with a merge conflict in a directory
that he "doesn't have access to"...

It would be nice to have the user only care about conflicts in directories/files
to which he has permissions. I don't know if it would be very
difficult to design.
* Just because the index is sparse does not mean other commands can't
walk into those directories.  So `git grep` (when given a revision),
`git diff`, `git log`, etc. will look in (old versions of) those
paths.
Agree.
quoted
Anyway, just saying that for the longer term I'm not aware of an
*intrinsic* reason for why we couldn't support this sort of thing, in
case anyone's interested in putting in a *lot* of leg work to make it
happen.
And on top of the technical leg work required, they would also need to
somehow convince everyone else that it's worth accepting the increased
maintenance effort.  Right now, even if someone had already done the
work to implement it, I'd say it's not worth the maintenance costs.

However, there are two alternative choices I can think of here: You
can use submodules if you want a fixed part of the repository to only
be available to a subset of folks, or use josh
(https://github.com/josh-project/josh) if you need it to be more
dynamic.
Thanks, I will take a look.

ZheNing Hu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help