Re: [PATCH v3] grep: fix multibyte regex handling under macOS
From: Junio C Hamano <hidden>
Date: 2022-08-26 00:20:36
Diomidis Spinellis [off-list ref] writes:
quoted hunk ↗ jump to hunk
diff --git a/Makefile b/Makefile index 04d0fd1fe6..d1a9825715 100644 --- a/Makefile +++ b/Makefile@@ -1427,7 +1427,6 @@ ifeq ($(uname_S),Darwin) APPLE_COMMON_CRYPTO = YesPlease COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO endif - NO_REGEX = YesPlease PTHREAD_LIBS = endif@@ -2970,6 +2969,7 @@ GIT-BUILD-OPTIONS: FORCE @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ + @echo NO_REGEX=\''$(subst ','\'',$(subst ','\'',$(NO_REGEX)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ @echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+ @echo DC_SHA1=\''$(subst ','\'',$(subst ','\'',$(DC_SHA1)))'\' >>$@+
Build part looks good to me.
quoted hunk ↗ jump to hunk
diff --git a/common-main.c b/common-main.c index c531372f3f..0a22861f1c 100644 --- a/common-main.c +++ b/common-main.c@@ -40,6 +40,7 @@ int main(int argc, const char **argv) git_resolve_executable_dir(argv[0]); + setlocale(LC_CTYPE, ""); git_setup_gettext(); initialize_the_repository();diff --git a/gettext.c b/gettext.c index bb5ba1fe7c..f139008d0a 100644 --- a/gettext.c +++ b/gettext.c@@ -10,7 +10,6 @@ #include "config.h" #ifndef NO_GETTEXT -# include <locale.h> # include <libintl.h> # ifdef GIT_WINDOWS_NATIVE@@ -80,7 +79,6 @@ static int test_vsnprintf(const char *fmt, ...) static void init_gettext_charset(const char *domain) { - setlocale(LC_CTYPE, ""); charset = locale_charset(); bind_textdomain_codeset(domain, charset);diff --git a/git-compat-util.h b/git-compat-util.h index 58d7708296..c6fa3c7469 100644 --- a/git-compat-util.h +++ b/git-compat-util.h@@ -212,6 +212,7 @@ #endif #include <errno.h> #include <limits.h> +#include <locale.h> #ifdef NEEDS_SYS_PARAM_H #include <sys/param.h> #endif
I'll let others more familiar with the locale support to comment on these changes. We are unconditionally including <locale.h> now; before platforms that lack locale.h can set NO_GETTEXT but that will no longer work as a "workaround" for them. I do not know if thta is a practical downside to anybody, but it could be a problem.
quoted hunk ↗ jump to hunk
diff --git a/t/t7818-grep-multibyte.sh b/t/t7818-grep-multibyte.sh new file mode 100755 index 0000000000..a3889f9822 --- /dev/null +++ b/t/t7818-grep-multibyte.sh
Do we need a new test script for this?
quoted hunk ↗ jump to hunk
@@ -0,0 +1,34 @@ +#!/bin/sh + +test_description='grep multibyte characters' + +. ./test-lib.sh + +# Multibyte regex search is only supported with a native regex library +# that supports it. +# (The supplied compatibility library is compiled with NO_MBSUPPORT.)
This file is not specific to Darwin; "... with a native regex library" etc. is not something we want to see here.
+test -z "$NO_REGEX" && + LC_ALL=en_US.UTF-8 test-tool regex '^.$' '¿' && + test_set_prereq MB_REGEX
We can safely drop 'test -z "$NO_REGEX" &&' part here, no? Even if we omit it, those who built with NO_REGEX would fail "test-tool regex" step above. And by omitting $NO_REGEX check, we do not have to look for and update the condition when the fallback regex engine we use starts supporting MB_REGEX.
+if ! test_have_prereq MB_REGEX +then + skip_all='multibyte grep tests; Git compiled with NO_REGEX, NO_MBSUPPORT' + test_done +fi
I do not think if we need a single use prereq here. We can just use whatever condition that is used to set MB_REGEX above and do the skip-all thing here.
+test_expect_success 'setup' ' + test_write_lines "¿" >file && + git add file && + LC_ALL="en_US.UTF-8" && + export LC_ALL +'
Missing inter-test blank line.
+test_expect_success 'grep exactly one char in single-char multibyte file' ' + git grep "^.$" +'
I am not sure how much value we are getting out of this test, which is identical to what we already tested earlier above with "test-tool regex".
+test_expect_success 'grep two chars in single-char multibyte file' ' + test_expect_code 1 git grep ".." +' + +test_done