Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
From: Markus Heiser <hidden>
Date: 2021-05-07 09:51:50
Am 07.05.21 um 11:14 schrieb Mauro Carvalho Chehab:
Em Fri, 7 May 2021 10:56:39 +0200 Markus Heiser [off-list ref] escreveu:quoted
Am 07.05.21 um 10:35 schrieb Michal Suchánek:quoted
So the bottom line is that UTF-8 in the files will stay, and Sphinx cannot handle UTF-8 when the locale is not UTF-8. In the long run it might be nice to fix Sphinx to properly set the encoding of the files it reads and writes. Or maybe there is some parameter that specifies it?Let's not mix things up. The Unicode-Error is not related or limited to log nor to sphinx, it is related to the fact that we (you) try to run a utf-8 application in an environment which is not full utf-8 functional.No. The application itself is not UTF-8. The application input files are.
May be we have a different view on this, for me an application which reads UTF-8 in and spids out UTF-8 is an UTF-8 application. hint: HTML is just one Sphinx writer, there exist also other writers e.g. LaTeX.
The big issue with the way python works with charsets is due to that: it does a very poor job with regards to that.
This is your POV, the python developers have a different view on handling strings. There are epic discussions around about. But all this discussions won't help, since we can't change the principles of python. Personally I think I can't ignore the principles of a language and I'm feeling well with setting up an UTF-8 environment.
I remember that in the past I had to use this quite often (before UTF-8 being default on the distros I was using on that time): LANG=C <some_python_script> Just to avoid them to crash. If I'm not mistaken, older Fedora/Mandrake distros had some bugs with python-written scripts that, if the machine's language were not English, such scripts crash, as the i18n translated messages were on a different charset than what the python script would be expecting.
For me "i18n translated message" is a good example that I'm not wrong with my opinions. This is not true for all devices but on those device you won't run an applications like Sphinx.
quoted
quoted
For the short term I think it is reasonable to run a python test script that prints fancy unicode characters before running Sphinx and bail if the test script fails.To be assure, I recommend to set UTF-8 locale environment in the Makefile. My experience shows that this is the default with almost all containers (images), there are only a few where this is not the case (may be suse?).That may not be true on certain parts of the globe.
Sorry, I have spoken about common LXC images.
I've no idea what charsets the most-used distributions in Asian Countries use use ;-)
I guess these days most often they will use UTF-8 since ASCII haven't helped in the past 80s ;-) -- Markus --