Thread (51 messages) 51 messages, 6 authors, 2022-09-23

Re: [PATCH v1 1/2] builtin/grep.c: add --sparse option

From: Shaoxuan Yuan <hidden>
Date: 2022-08-24 18:20:34

Hi reviewrs,

I came back from busying with relocation :)

On 8/17/2022 10:12 PM, Derrick Stolee wrote:
 > On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
 >> Add a --sparse option to `git-grep`. This option is mainly used to:
 >>
 >> If searching in the index (using --cached):
 >>
 >> With --sparse, proceed the action when the current cache_entry is
 >
 > This phrasing is awkward. It might be better to reframe to describe the
 > _why_ before the _what_
 >
 >   When the '--cached' option is used with the 'git grep' command, the
 >   search is limited to the blobs found in the index, not in the worktree.
 >   If the user has enabled sparse-checkout, this might present more 
results
 >   than they would like, since the files outside of the 
sparse-checkout are
 >   unlikely to be important to them.
 >
 >   Change the default behavior of 'git grep' to focus on the files within
 >   the sparse-checkout definition. To enable the previous behavior, add a
 >   '--sparse' option to 'git grep' that triggers the old behavior that
 >   inspects paths outside of the sparse-checkout definition when paired
 >   with the '--cached' option.

Good suggestion!

 > Or something like that. The documentation updates will also help clarify
 > what happens when '--cached' is not included. I assume '--sparse' is
 > ignored, but perhaps it _could_ allow looking at the cached files outside
 > the sparse-checkout definition, this could make the simpler invocation of
 > 'git grep --sparse <pattern>' be the way that users can search after 
their
 > attempt to search the worktree failed.

This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.

I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."

To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.

 >> marked with SKIP_WORKTREE bit (the default is to skip this kind of
 >> entry). Before this patch, --cached itself can realize this action.
 >> Adding --sparse here grants the user finer control over sparse
 >> entries. If the user only wants to peak into the index without
 >
 > s/peak/peek/
 >
 >> caring about sparse entries, --cached should suffice; if the user
 >> wants to peak into the index _and_ cares about sparse entries,
 >> combining --sparse with --cached can address this need.
 >>
 >> Suggested-by: Victoria Dye [off-list ref]
 >> Signed-off-by: Shaoxuan Yuan [off-list ref]
 >> ---
 >>  builtin/grep.c                  | 10 +++++++++-
 >>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
 >>  2 files changed, 15 insertions(+), 7 deletions(-)
 >
 > You mentioned in Slack that you missed the documentation of the --sparse
 > option. Just pointing it out here so we don't forget.

Will do.

 >>
 >> diff --git a/builtin/grep.c b/builtin/grep.c
 >> index e6bcdf860c..61402e8084 100644
 >> --- a/builtin/grep.c
 >> +++ b/builtin/grep.c
 >> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
 >>
 >>  static int skip_first_line;
 >>
 >> +static int grep_sparse = 0;
 >> +
 >
 > I initially thought it might be good to not define an additional global,
 > but there are many defined in this file outside of the context and they
 > are spread out with extra whitespace like this.
 >
 >>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
 >>  {
 >>      if (opt->binary != GREP_BINARY_TEXT)
 >> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
 >>      for (nr = 0; nr < repo->index->cache_nr; nr++) {
 >>          const struct cache_entry *ce = repo->index->cache[nr];
 >>
 >> -        if (!cached && ce_skip_worktree(ce))
 >
 > This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
 > was provided.
 >
 >> +        /*
 >> +         * If ce is a SKIP_WORKTREE entry, look into it when both
 >> +         * --sparse and --cached are given.
 >> +         */
 >> +        if (!(grep_sparse && cached) && ce_skip_worktree(ce))
 >>              continue;
 >
 > The logic of this if statement is backwards from the comment because a
 > true statement means "skip the entry" _not_ "look into it".
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless both --sparse and
 >      * --cached are given.
 >      */

Got it.

 > But again, we might want to consider this alternative:
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless --sparse is given.
 >      */
 >     if (!grep_sparse && ce_skip_worktree(ce))
 >         continue;
 >
 > This will require further changes below, specifically this bit:
 >
 >             /*
 >              * If CE_VALID is on, we assume worktree file and its
 >              * cache entry are identical, even if worktree file has
 >              * been modified, so use cache version instead
 >              */
 >             if (cached || (ce->ce_flags & CE_VALID)) {
 >                 if (ce_stage(ce) || ce_intent_to_add(ce))
 >                     continue;
 >                 hit |= grep_oid(opt, &ce->oid, name.buf,
 >                          0, name.buf);
 >             } else {
 >
 > We need to activate this grep_oid() call also when ce_skip_worktree(c) is
 > true. That is, if we want 'git grep --sparse' to extend the search beyond
 > the worktree and into the sparse entries.
 >
 >>
 >>          strbuf_setlen(&name, name_base_len);
 >> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const 
char *prefix)
 >>                 PARSE_OPT_NOCOMPLETE),
 >>          OPT_INTEGER('m', "max-count", &opt.max_count,
 >>              N_("maximum number of results per file")),
 >> +        OPT_BOOL(0, "sparse", &grep_sparse,
 >> +             N_("search sparse contents and expand sparse index")),
 >
 > This "and expand sparse index" is an internal implementation detail, 
not a
 > heplful item for the help text. Instead, perhaps:
 >
 >     "search the contents of files outside the sparse-checkout definition"

Sounds good!

 > (Also, while the sparse index is being expanded right now, I would expect
 > to not expand the sparse index by the end of the series.)
 >
 >> -test_expect_success 'grep --cached searches entries with the 
SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --cached and --sparse searches entries 
with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >>      dir/c:text
 >>      EOF
 >> -    git grep --cached "text" >actual &&
 >> +    git grep --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Please add a test that demonstrates the change of behavior when only 
--cached
 > is provided, not --sparse.

Sure!

 > (If you take my suggestion to allow 'git grep --sparse' to do something
 > different, then also add a test for that case.)
 >
 >>
 >> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules 
honors sparse checkout in submodu
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --recurse-submodules --cached searches 
entries with the SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --recurse-submodules --cached and 
--sparse searches entries with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules 
--cached searches entries with th
 >>      sub/B/b:text
 >>      sub2/a:text
 >>      EOF
 >> -    git grep --recurse-submodules --cached "text" >actual &&
 >> +    git grep --recurse-submodules --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not 
search the index with CE_VALID a
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --cached searches index entries with both 
CE_VALID and SKIP_WORKTREE' '
 >> +test_expect_success 'grep --cached and --sparse searches index 
entries with both CE_VALID and SKIP_WORKTREE' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches 
index entries with both CE_VALID and
 >>      EOF
 >>      test_when_finished "git update-index --no-assume-unchanged b" &&
 >>      git update-index --assume-unchanged b &&
 >> -    git grep --cached text >actual &&
 >> +    git grep --cached --sparse text >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Same with these two tests. Add additional commands that show the 
change of
 > behavior when only using '--cached'.

--
Thanks,
Shaoxuan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help