Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
From: Mauro Carvalho Chehab <mchehab@kernel.org>
Date: 2021-05-07 08:52:22
Em Thu, 6 May 2021 20:06:25 +0200 Michal Suchánek [off-list ref] escreveu:
On Thu, May 06, 2021 at 07:53:25PM +0200, Markus Heiser wrote:
quoted
Hi Mauro, it is not comfortable but is it mad? .. Most often languages (or applications) do not handle encoding of strings they just piping a binary stream while python decode / encodes strings. "The Zen of Python" [1] says Explicit is better than implicit.
This was taken into an extreme with regards to charsets: "better" should never be translated to "crash" ;-)
quoted
If a stream can't encode symbols and these symbols should be ignored you have to set the encoding of the stream explicit to ignore such symbols.The problem is this part never happened. Loggers are supposed to tell you about the error in your application, not crash it.
It is insane to crash the error log due to a charset issue ;-)
But the problem with Sphinx may be that the output file is also assumed to be in the locale encoding, and the output encoding is never set. It's HTML so it could be encoded with entities, too. The idea about handlinng encoding precisely is not mad in itself but then everybody working with just ASCII and never testing their software works in the cases where explicit handling is needed is the mad part.
True. The machine's locale shouldn't affect *at all* the produced documents. See, there's a hole set of non-latin family of charsets supported on Linux: https://man7.org/linux/man-pages/man7/charsets.7.html Nothing prevents that someone using a machine whose default encoding is KOI8-R/BIG-5/GB 2312/JIS X 0208/... to use Sphinx to produce UTF-8 [1] documents. [1] or whatever other output encoding Ok, the logger may not be able to correctly display certain chars, but it it be perfectly fine and sane to use //TRANSLIT (or something similar) in order to do a charset conversion. Even to just print a <?> for all chars that aren't printable at the logger's output using the charset set by LANG/LC_* is better/saner than crashing. Thanks, Mauro