Re: b4: unicode control characters -- warn or remove?
From: Konstantin Ryabitsev <hidden>
Date: 2021-11-01 20:22:25
Also in:
tools
On Mon, Nov 01, 2021 at 09:02:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
It checks whitespace because that's something that's commonly a source
of patch corruption. I'm not adverse to adding this to core.whitespace,
but trying to catch malicious injected code seems like a rather big
expansion of its scope, particularly since:
"[...]sending patches for docs actually written in RTL languages[...]"
Or just code? People write comment and even in their native languages,
and not all projects are as anglo-centric as those hosted on kernel.org.My comment about docs was purely within the scope of the Linux kernel. I think the following would be a sane check: 1. are there unicode control characters (CCs) present? 2. are there other characters from RTL languages present in the same line? if both 1 && 2 are true, this is a legitimate use of Unicode CCs. If only 1 is true, then it's likely worth a warning. Maybe even relax #2 to just check for unicode characters above a certain barrier where RTL languages live. I think everyone will agree that if there are unicode CCs and no other unicode characters in that same line, it's likely not a legitimate use of control characters. -K