Re: [PATCH 5/5] grep: remove regflags from the public grep_opt API
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2017-06-29 18:16:27
On Thu, Jun 29 2017, Stefan Beller jotted:
On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason [off-list ref] wrote:quoted
Refactor calls to the grep machinery to always pass opt.ignore_case & opt.extended_regexp_option instead of setting the equivalent regflags bits. The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log: make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was really just plastering over the code smell which this change fixes. See my "Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp"[1] for the discussion leading up to this. The reason for adding the extensive commentary here is that I discovered some subtle complexity in implementing this that really should be called out explicitly to future readers. Before this change we'd rely on the difference between `extended_regexp_option` and `regflags` to serve as a membrane between our preliminary parsing of grep.extendedRegexp and grep.patternType, and what we decided to do internally. Now that those two are the same thing, it's necessary to unset `extended_regexp_option` just before we commit in cases where both of those config variables are set. See 84befcd0a4 ("grep: add a grep.patternType configuration setting", 2012-08-03) for the code and documentation related to that. The explanation of why the if/else branches in grep_commit_pattern_type() are ordered the way they are exists in that commit message, but I think it's worth calling this subtlety out explicitly with a comment for future readers.Up to here the commit message is inspiring confidence.
Thanks.
quoted
Unrelated to that: I could have factored out the default REG_NEWLINE flag into some custom GIT_GREP_H_DEFAULT_REGFLAGS or something, but since it's just used in two places I didn't think it was worth the effort. As an aside we're really lacking test coverage regflags being initiated as 0 instead of as REG_NEWLINE. Tests will fail if it's removed from compile_regexp(), but not if it's removed from compile_fixed_regexp(). I have not dug to see if it's actually needed in the latter case or if the test coverage is lacking.This sounds as if extra careful review is needed.
Note though (since I didn't say this explicitly) nothing about this commit changes the semanics of what we pass to regcomp, I'm just noting this caveat with REG_NEWLINE as an aside since I'm moving it around.
quoted
1. [ref] (https://public-inbox.org/git/CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <redacted> --- builtin/grep.c | 2 -- grep.c | 43 ++++++++++++++++++++++++++++++++++--------- grep.h | 1 - revision.c | 2 -- 4 files changed, 34 insertions(+), 14 deletions(-)diff --git a/builtin/grep.c b/builtin/grep.c index f61a9d938b..b682966439 100644 --- a/builtin/grep.c +++ b/builtin/grep.c@@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.pattern_list) die(_("no pattern given.")); - if (!opt.fixed && opt.ignore_case) - opt.regflags |= REG_ICASE; /* * We have to find "--" in a separate pass, because its presencediff --git a/grep.c b/grep.c index 736e1e00d6..51aaad9f03 100644 --- a/grep.c +++ b/grep.c@@ -35,7 +35,6 @@ void init_grep_defaults(void) memset(opt, 0, sizeof(*opt)); opt->relative = 1; opt->pathname = 1; - opt->regflags = REG_NEWLINE; opt->max_depth = -1; opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED; color_set(opt->color_context, "");@@ -154,7 +153,6 @@ void grep_init(struct grep_opt *opt, const char *prefix) opt->linenum = def->linenum; opt->max_depth = def->max_depth; opt->pathname = def->pathname; - opt->regflags = def->regflags; opt->relative = def->relative; opt->output = def->output;@@ -170,6 +168,24 @@ void grep_init(struct grep_opt *opt, const char *prefix) static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { + /* + * When committing to the pattern type by setting the relevant + * fields in grep_opt it's generally not necessary to zero out + * the fields we're not choosing, since they won't have been + * set by anything. The extended_regexp_option field is the + * only exception to this. + * + * This is because in the process of parsing grep.patternType + * & grep.extendedRegexp we set opt->pattern_type_option and + * opt->extended_regexp_option, respectively. We then + * internally use opt->extended_regexp_option to see if we're + * compiling an ERE. It must be unset if that's not actually + * the case. + */ + if (pattern_type != GREP_PATTERN_TYPE_ERE && + opt->extended_regexp_option) + opt->extended_regexp_option = 0; + switch (pattern_type) { case GREP_PATTERN_TYPE_UNSPECIFIED: /* fall through */@@ -178,7 +194,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st break; case GREP_PATTERN_TYPE_ERE: - opt->regflags |= REG_EXTENDED; + opt->extended_regexp_option = 1; break; case GREP_PATTERN_TYPE_FIXED:@@ -208,6 +224,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED) grep_set_pattern_type_option(opt->pattern_type_option, opt); else if (opt->extended_regexp_option) + /* + * This branch *must* happen after setting from the + * opt->pattern_type_option above,I do not quite understand this. Are you saying opt->pattern_type_option takes precedence over opt->extended_regexp_option if the former is not _UNSPECIFIED ?
I mean this "else if" code *must* be in that order, i.e.: else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED) grep_set_pattern_type_option(opt->pattern_type_option, opt); else if (opt->extended_regexp_option) grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt); Not: else if (opt->extended_regexp_option) grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt); else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED) grep_set_pattern_type_option(opt->pattern_type_option, opt); Since we only want to pay attention to grep.extendedRegexp it grep.patternType is not set. If grep.patternType is set then the pattern_type_option will not be GREP_PATTERN_TYPE_UNSPECIFIED (but e.g. GREP_PATTERN_TYPE_BRE).
As grep_set_pattern_type_option is only called from here, I wondered if we can put the long comment (and the code) here in this function grep_commit_pattern_type to have it less subtle? I have no proposal how though.
Ah you mean the whole "When committing to the pattern type by" comment + code. Yeah I think that makes sense. I'll try that for v2 and see if it's better.
I think I grokked this patch and it makes sense, though the commit message strongly hints at asking for tests. ;)
*Points up at "moving it around" comment above*