Re: b4: unicode control characters -- warn or remove?

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2021-11-01 20:05:55
Also in: tools

On Mon, Nov 01 2021, Eric Wong wrote:

Konstantin Ryabitsev [off-list ref] wrote:

quoted

Hi, all:

Per exhibit a, what should we do in the situation where we discover unicode
control characters in an email?

1. Warn and strip these chars out, because they are extremely unlikely to be
   doing anything legitimate in the context of a patch (unless someone is
   sending patches for docs actually written in RTL languages)
2. Warn and error out, refusing to produce an mbox
3. Just warn and produce an mbox anyway

I'd normally do #3, but with many people piping things to git-am, I'm not sure
if it's the safest choice.

Exibit a: https://lwn.net/Articles/874546/

+Cc: git@vger

IMHO, defense for this belongs in git-am (which already checks
things like whitespace).

It checks whitespace because that's something that's commonly a source
of patch corruption. I'm not adverse to adding this to core.whitespace,
but trying to catch malicious injected code seems like a rather big
expansion of its scope, particularly since:

    "[...]sending patches for docs actually written in RTL languages[...]"

Or just code? People write comment and even in their native languages,
and not all projects are as anglo-centric as those hosted on kernel.org.

I haven't checked what the overlap is between solving this issue & i18n
support, but we definitely should not be assuming that git's only using
by kernel.org users & similar, even something as relatively obscure as
git-am.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help