Thread (6 messages) 6 messages, 3 authors, 2020-06-18

Re: [PATCH] unpack-trees: do not set SKIP_WORKTREE on submodules

From: Matheus Tavares Bernardino <hidden>
Date: 2020-06-17 17:58:58

On Tue, Jun 16, 2020 at 10:21 PM Elijah Newren via GitGitGadget
[off-list ref] wrote:
From: Elijah Newren <redacted>

As noted in commit e7d7c73249 ("git-sparse-checkout: clarify
interactions with submodules", 2020-06-10), sparse-checkout cannot
remove submodules even if they don't match the sparsity patterns,
because doing so would risk data loss -- unpushed, uncommitted, or
untracked changes could all be lost.  That commit also updated the
documentation to point out that submodule initialization state was a
parallel, orthogonal reason that entries in the index might not be
present in the working tree.

However, sparsity and submodule initialization weren't actually fully
orthogonal yet.  The SKIP_WORKTREE handling in unpack_trees would
attempt to set the SKIP_WORKTREE bit on submodules when the submodule
did not match the sparsity patterns.  This resulted in innocuous but
potentially alarming warning messages:

    warning: unable to rmdir 'sha1collisiondetection': Directory not empty

It could also make things confusing since the entry would be marked as
SKIP_WORKTREE in the index but actually still be present in the working
tree:

    $ git ls-files -t | grep sha1collisiondetection
    S sha1collisiondetection
    $ ls -al sha1collisiondetection/ | wc -l
    13

Submodules have always been their own form of "partial checkout"
behavior, with their own "present or not" state determined by running
"git submodule [init|deinit|update]".  Enforce that separation by having
the SKIP_WORKTREE logic not touch submodules and allow submodules to
continue using their own initialization state for determining if the
submodule is present.
Makes sense to me.

I'm just thinking about the possible implications in grep (with
mt/grep-sparse-checkout). As you mentioned in [1], users might think
of "git grep $rev $pat" as an optimized version of "git checkout $rev
&& git grep $pat". And, in this sense, they probably expect the output
of these two operations to be equal. But if we don't set the
SKIP_WORKTREE bit for submodules they might diverge.

As an example, if we have a repository like:
.
|-- A
|   `-- sub
`-- B

And the [cone-mode] sparsity rules:
/*
!/*/
/B/

Then, "git grep --recurse-submodules $rev $pat" would search only in B
(as A doesn't match the sparsity patterns and thus, is not recursed
into). But "git checkout $rev && git grep --recurse-submodules $pat"
would search in both B and A/sub (as the latter would not have the
SKIP_WORKTREE bit set).

This might be a problem for git-grep, not git-sparse-checkout. But I'm
not sure how we could solve it efficiently, as the submodule might be
deep down in a path whose first dir was already ignored for not
matching the sparsity patterns. Is this a problem we should consider,
or is it OK if the outputs of these two operations diverge?

[1]: https://lore.kernel.org/git/CABPp-BFKHivKffBPO0M_s-JtpiLyEMLZr+sX9_yZk9ZX7ULtbw@mail.gmail.com/ (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help