Re: [PATCH] tests: remove the GETTEXT_POISON compile-time option to test i18n marking
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2017-04-26 14:28:03
On Wed, Apr 26, 2017 at 11:17 AM, Michael J Gruber [off-list ref] wrote:
[Turns out I still can't operate gmail's web interface. Sorry for the dupe.] 2017-04-24 13:04 GMT+02:00 Ævar Arnfjörð Bjarmason [off-list ref]:quoted
Remove the GETTEXT_POISON=YesPlease compile-time which turns all of git's LC_*=C output into strings like "# GETTEXT POISON #" instead of gettext(msgid). See commit bb946bba76 ("i18n: add GETTEXT_POISON to simulate unfriendly translator", 2011-02-22) for what this was originally intended for. This facility has been broken for quite a while and has been subjected to frequent bitrot. The initial idea behind it back when it was added in 2011 was to prevent the accidental translation of plumbing messages. This isn't a big concern anymore as git isn't mass-adding i18n messages for a newly developed i18n facility as it was back then, maintaining this facility incurs a burden, and in actuality this has often been broken long enough for potential plumbing messages to be translated & make their way into major releases anyway. Most of this patch consists of search/replacing the test suite for: test_i18ngrep ! -> ! grep test_i18ngrep -> grep test_i18ncmp -> test_cmp 1. [ref] (https://public-inbox.org/git/AANLkTi=5MrU-JyeQ3UVNbVwzn-8FbstUXafgcQaLWXDB@mail.gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <redacted> --- On Mon, Apr 24, 2017 at 3:18 AM, Junio C Hamano [off-list ref] wrote:quoted
Michael J Gruber [off-list ref] writes:quoted
Ævar Arnfjörð Bjarmason venit, vidit, dixit 20.04.2017 23:58:quoted
As a refresh of everyone's memory (because mine needed it). This is a feature I added back in 2011 when the i18n support was initially added. There was concern at the time that we would inadvertently mark plumbing messages for translation, particularly something in a shared code path, and this was a way to hopefully smoke out those issues with the test suite. However compiling with it breaks a couple of dozen tests, I stopped digging when I saw some broke back in 2014. What should be done about this? I think if we're going to keep them they need to be run regularly by something like Travis (Lars CC'd), however empirical evidence suggests that not running them is just fine too, so should we just remove support for this test mode? I don't care, but I can come up with the patch either way, but would only be motivated to write the one-time fix for it if some CI system is actually running them regularly, otherwise they'll just be subject to bitrotting again.I use that switch when I change something that involves l10n, but usually I run specific tests only. To be honest: I have to make sure not to get confused by (nor forget one of) the build flag GETTEXT_POISON and the environment variable GIT_GETTEXT_POISON. I'm not sure I always tested what I meant to test...To be quite honest, I have always felt that we are just as likely inadvertently use test_i18ncmp when we should use test_cmp (and vice versa) as we would mark plumbing messages with _() by mistake with this approach, and even with constant monitoring by something like Travis, GETTEXT_POISON may be able to catch mistakes only some of the time (i.e. when we do not make mistakes in writing our tests). Without constant monitoring, I agree that the mechanism does not work well to catch our mistakes.Here's an alternate patch to just remove it entirely. I think we should apply this instead, the only reason I sent the patch to fix it up was because of Michael's comment that he was occasionally using it.Yes, I think test_i18ngrep and test_i18ncmp gave the impression that they are i18n-aware grep and cmp, whereas in fact they turned off these tese test lines completely. Combined with the fact that GETTEXT_POSON builds turned on GIT_GETTEXT_POISON, this sounds somewhat dangerous - we test more aspects of plumbing commands but turn off (some) tests for porcelain.
It gives us no less test coverage because we still run the tests without GETTEXT_POSON. Thus running first without that & then with gives 100% coverage. The entire point of the GETTEXT_POSON mode is that we've already manually gone over things that broke in the past, and are skipping them with C_LOCALE_OUTPUT or one of these i18n functions, and when adding new translations we keep an eye out for new breakages that haven't been marked yet. These are then a candidate either for marking to skip, if they're porcelain commands using our new gettext()-utilizing code, or we've made a mistake and translated some plumbing, in which case that would hopefully be caught by list review or Travis. I still think it's better to just get rid of it, but this aspect of it is working exactly as intended.
I'm still wondering wether we couldn't generate a test locale automatically by mangling the english strings (but preserving format specifiers). That way, tests that test porcelain output could require LANG=C while others could run in the mangled locale. (tt_TT is taken, though ;) )
If we wanted to keep GETTEXT_POSON this would be going further in the
wrong direction. The reason the message starts with a "#", has no
trailing \n & most importantly ignores any format specifiers the real
translation would have is exactly because we'd like as many things as
can break break under this mode.
If anything we should consider changing "# GETTEXT_POISON #" to
"<random binary garbage>" if we keep it.
Including format specifiers would just reduce the set of things that
would break, consider e.g. a test that merely greps for some substring
that would be emitted by a "%s branch %d commits ahead" format, now we
change that to "%s GIBBER %d ISH", but the test still passes because
it just does $(awk '{print $1}'), but it might be a plumbing message
that could be translated as e.g. "%d WORD1 %s WORD2 WORD3", breaking
the API.