Thread (9 messages) 9 messages, 5 authors, 2025-07-15

Re: [PATCH 1/1] http: don't send C or POSIX in Accept-Language

From: Carlo Marcelo Arenas Belón <hidden>
Date: 2025-07-11 20:57:05

On Fri, Jul 11, 2025 at 10:02:01AM -0800, Collin Funk wrote:
Justin Tobler [off-list ref] writes:
quoted
From my understanding, each language is expected to be defined in the
following form:

  language[_territory][.codeset][@modifier]

When we parse the list of languages we only care about the
`language[_territory]` part though.

From looking at ISO 639 language codes, only codes with two or three
characters are valid. If we wanted to be a bit more strict, we could
check the length of the language code (everything before the first '_')
and filter out anything outside of those limits. This would naturally
filter out "C" and "POSIX" without having to mention them explicitly.
Filtering out anything that isn't 2-3 letters seems like a good
heuristic to me.
except that it would be incorrect, as language tags are defined in RFC5646
and are larger than that.

most importantly, deriving language tags from locales provides some very
useful tags when including the characters after the _, because zh_CN and
zh_HK use completely different scripts, for example.

Carlo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help