Thread (87 messages) 87 messages, 5 authors, 2021-02-09

Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns

From: Derrick Stolee <hidden>
Date: 2020-03-24 17:02:09

On 3/24/2020 12:16 PM, Elijah Newren wrote:
On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee [off-list ref] wrote:
quoted
On 3/24/2020 3:15 AM, Elijah Newren wrote:
quoted
Hi Matheus,

On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares
...
quoted
quoted
quoted
Something I'm not entirely sure in this patch is how we implement the
mechanism to honor sparsity for the `git grep <commit-ish>` case (which
is treated in the grep_tree() function). Currently, the patch looks for
an index entry that matches the path, and then checks its skip_worktree
As you discuss below, checking the index is both wrong _and_ costly.
I'm not sure why checking the index is _wrong_, but I agree about the
performance cost.
Let's say there are two directories, dir1 and dir2.  Over time, there
have existed a total of six files:
   dir1/{a,b,c}
   dir2/{d,e,f}
At the current time, there are only four files in the index:
   dir1/{a,b}
   dir2/{d,e}
And the user has done a `git sparse-checkout set dir2` and then at
some point later run `git grep OTHERCOMMIT foobar`.  What happens?

Well, since we're in a sparse checkout, we should only search the
relevant paths within OTHERCOMMIT for "foobar".  Let's say we attempt
to figure out the "relevant paths" using the index.  We can tell that
dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them.
dir1/c is untracked -- what do we do with it?  Include it?  Exclude
it?  Carrying on with the other files, dir2/d and dir2/e are tracked
and !SKIP_WORKTREE so we search them.  dir2/f is untracked -- what do
we do with it?  Include it?  Exclude it?

We're left without the necessary information to tell whether we should
search OTHERCOMMIT's dir1/c and dir2/f if we consult the index.  Any
decision we make is going to be wrong for one of the two paths.

If we instead do not attempt to consult the index (which corresponds
to a version close to HEAD) in order to ask questions about the
completely different OTHERCOMMIT, but instead use the sparsity
patterns to query whether those files/directories are interesting,
then we get the right answer.  The index can only be consulted for the
right answer in the case of --cached; in all other cases (including
OTHERCOMMIT == HEAD), we should use the sparsity patterns.  In fact,
we could also use the sparsity patterns in the case of --cached, it's
just that for that one particular case consulting the index will also
give the right answer.
Thanks! This helps a lot.

-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help