Thread (63 messages) 63 messages, 4 authors, 2017-06-29

Re: [PATCH 5/5] grep: remove regflags from the public grep_opt API

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2017-06-29 18:16:27

On Thu, Jun 29 2017, Stefan Beller jotted:
On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
[off-list ref] wrote:
quoted
Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.

The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.

See my "Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with
--perl-regexp"[1] for the discussion leading up to this.

The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.

Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.

Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.

The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.
Up to here the commit message is inspiring confidence.
Thanks.
quoted
Unrelated to that: I could have factored out the default REG_NEWLINE
flag into some custom GIT_GREP_H_DEFAULT_REGFLAGS or something, but
since it's just used in two places I didn't think it was worth the
effort.

As an aside we're really lacking test coverage regflags being
initiated as 0 instead of as REG_NEWLINE. Tests will fail if it's
removed from compile_regexp(), but not if it's removed from
compile_fixed_regexp(). I have not dug to see if it's actually needed
in the latter case or if the test coverage is lacking.
This sounds as if extra careful review is needed.
Note though (since I didn't say this explicitly) nothing about this
commit changes the semanics of what we pass to regcomp, I'm just noting
this caveat with REG_NEWLINE as an aside since I'm moving it around.
quoted
1. [ref]
   (https://public-inbox.org/git/CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <redacted>
---
 builtin/grep.c |  2 --
 grep.c         | 43 ++++++++++++++++++++++++++++++++++---------
 grep.h         |  1 -
 revision.c     |  2 --
 4 files changed, 34 insertions(+), 14 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index f61a9d938b..b682966439 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)

        if (!opt.pattern_list)
                die(_("no pattern given."));
-       if (!opt.fixed && opt.ignore_case)
-               opt.regflags |= REG_ICASE;

        /*
         * We have to find "--" in a separate pass, because its presence
diff --git a/grep.c b/grep.c
index 736e1e00d6..51aaad9f03 100644
--- a/grep.c
+++ b/grep.c
@@ -35,7 +35,6 @@ void init_grep_defaults(void)
        memset(opt, 0, sizeof(*opt));
        opt->relative = 1;
        opt->pathname = 1;
-       opt->regflags = REG_NEWLINE;
        opt->max_depth = -1;
        opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
        color_set(opt->color_context, "");
@@ -154,7 +153,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
        opt->linenum = def->linenum;
        opt->max_depth = def->max_depth;
        opt->pathname = def->pathname;
-       opt->regflags = def->regflags;
        opt->relative = def->relative;
        opt->output = def->output;
@@ -170,6 +168,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)

 static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
 {
+       /*
+        * When committing to the pattern type by setting the relevant
+        * fields in grep_opt it's generally not necessary to zero out
+        * the fields we're not choosing, since they won't have been
+        * set by anything. The extended_regexp_option field is the
+        * only exception to this.
+        *
+        * This is because in the process of parsing grep.patternType
+        * & grep.extendedRegexp we set opt->pattern_type_option and
+        * opt->extended_regexp_option, respectively. We then
+        * internally use opt->extended_regexp_option to see if we're
+        * compiling an ERE. It must be unset if that's not actually
+        * the case.
+        */
+       if (pattern_type != GREP_PATTERN_TYPE_ERE &&
+           opt->extended_regexp_option)
+               opt->extended_regexp_option = 0;
+
        switch (pattern_type) {
        case GREP_PATTERN_TYPE_UNSPECIFIED:
                /* fall through */
@@ -178,7 +194,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
                break;

        case GREP_PATTERN_TYPE_ERE:
-               opt->regflags |= REG_EXTENDED;
+               opt->extended_regexp_option = 1;
                break;

        case GREP_PATTERN_TYPE_FIXED:
@@ -208,6 +224,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
        else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
                grep_set_pattern_type_option(opt->pattern_type_option, opt);
        else if (opt->extended_regexp_option)
+               /*
+                * This branch *must* happen after setting from the
+                * opt->pattern_type_option above,
I do not quite understand this. Are you saying

  opt->pattern_type_option takes precedence over
  opt->extended_regexp_option if the former is not _UNSPECIFIED ?
I mean this "else if" code *must* be in that order, i.e.:

	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
		grep_set_pattern_type_option(opt->pattern_type_option, opt);
	else if (opt->extended_regexp_option)
		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);

Not:

	else if (opt->extended_regexp_option)
		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);
	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
		grep_set_pattern_type_option(opt->pattern_type_option, opt);

Since we only want to pay attention to grep.extendedRegexp it
grep.patternType is not set. If grep.patternType is set then the
pattern_type_option will not be GREP_PATTERN_TYPE_UNSPECIFIED (but
e.g. GREP_PATTERN_TYPE_BRE).
As grep_set_pattern_type_option is only called from here,
I wondered if we can put the long comment (and the code)
here in this function grep_commit_pattern_type to have it less
subtle? I have no proposal how though.
Ah you mean the whole "When committing to the pattern type by" comment +
code. Yeah I think that makes sense. I'll try that for v2 and see if
it's better.
I think I grokked this patch and it makes sense, though the commit
message strongly hints at asking for tests. ;)
*Points up at "moving it around" comment above*
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help