Re: [PATCH 1/1] http: don't send C or POSIX in Accept-Language
From: Justin Tobler <hidden>
Date: 2025-07-11 15:29:17
On 25/07/10 10:16PM, brian m. carlson wrote:
The LANGUAGE environment variable is not specified by POSIX, but a variety of programs using GNU gettext accept it. The Linux manpages state that it can contain a colon-separated list of locales. However, not all locales are valid as languages. The C and POSIX locales, for instance, are not languages and are not registered with IANA, nor are they a part of ISO 639. In fact, "C" is too short to match the ABNF production for a language, which must be at least two characters in length. Nonetheless, many users provide these values in the LANGUAGE environment variable for unknown reasons and if they do, we do not want to send a malformed Accept-Language header to the server. If there are no other valid language tags, then send no header; otherwise, send only the valid tags, ignoring "C" and "POSIX" wherever they may appear, as well as any variants (such as the "C.UTF-8" locale found on some Linux systems).
Ok so the languages returned by `get_preferred_languages()` are used to write the Accept-Language header when making requests. Looking at `get_preferred_languages()` when NO_GETTEXT is defined, we already filter out "C" and "POSIX". So doing this for the LANGUAGE environment variable when writing the header also makes sense.
quoted hunk ↗ jump to hunk
We do not reject all possible invalid language tags since doing so would require bundling a copy of the IANA database and would risk poor behavior in the face of uncommon languages or values that are not registered but meet the production for private use or other restricted interchange. However, these two values are widely used in the LANGUAGE header, are well-known and widely used non-language locales, and have been seen in the wild on the server side. Signed-off-by: brian m. carlson <redacted> --- http.c | 8 ++++++++ t/t5541-http-push-smart.sh | 18 ++++++++++++++++++ 2 files changed, 26 insertions(+)diff --git a/http.c b/http.c index d88e79fbde..a96df4fcdb 100644 --- a/http.c +++ b/http.c@@ -2022,6 +2022,14 @@ static void write_accept_language(struct strbuf *buf) s++; if (tag.len) { + /* + * These are not valid languages: do not send them to + * the server. + */ + if (!strcmp(tag.buf, "C") || !strcmp(tag.buf, "POSIX")) { + strbuf_reset(&tag); + continue; + }
From my understanding, each language is expected to be defined in the following form: language[_territory][.codeset][@modifier] When we parse the list of languages we only care about the `language[_territory]` part though. From looking at ISO 639 language codes, only codes with two or three characters are valid. If we wanted to be a bit more strict, we could check the length of the language code (everything before the first '_') and filter out anything outside of those limits. This would naturally filter out "C" and "POSIX" without having to mention them explicitly. Not sure if being more strict adds much more value here in practice though. So it may be fine to keep it as-is. :)
quoted hunk ↗ jump to hunk
num_langs++; REALLOC_ARRAY(language_tags, num_langs); language_tags[num_langs - 1] = strbuf_detach(&tag, NULL);diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh index 538b603f03..96a6833e67 100755 --- a/t/t5541-http-push-smart.sh +++ b/t/t5541-http-push-smart.sh@@ -86,6 +86,24 @@ test_expect_success 'push to remote repository (standard) with sending Accept-La GIT_TRACE_CURL=true LANGUAGE="ko_KR.UTF-8" git push -v -v 2>err && ! grep "Expect: 100-continue" err && + grep "=> Send header: Accept-Language:" err >err.language && + test_cmp exp err.language && + + test_commit C-is-not-a-language && + GIT_TRACE_CURL=true LANGUAGE="C" git push -v -v 2>err && + + ! grep "=> Send header: Accept-Language:" err >err.language && + test_must_be_empty err.language && + + test_commit POSIX-is-not-a-language-either && + GIT_TRACE_CURL=true LANGUAGE="POSIX" git push -v -v 2>err && + + ! grep "=> Send header: Accept-Language:" err >err.language && + test_must_be_empty err.language &&
The above two tests demonstrate that the Accept-Language header is not sent if no valid languages are found.
+ + test_commit ignore-C-and-POSIX-as-languages-wherever-provided && + GIT_TRACE_CURL=true LANGUAGE="C.UTF-8:ko_KR.UTF-8:POSIX" git push -v -v 2>err && + grep "=> Send header: Accept-Language:" err >err.language && test_cmp exp err.language '
And here we see only the valid languages sent in the header. Looks good! -Justin