Thread (41 messages) 41 messages, 5 authors, 2021-05-17

Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)

From: Michal Suchánek <hidden>
Date: 2021-05-06 17:48:52

On Thu, May 06, 2021 at 07:04:44PM +0200, Markus Heiser wrote:
Am 06.05.21 um 18:46 schrieb Mauro Carvalho Chehab:
quoted
Em Thu, 6 May 2021 17:57:15 +0200
Markus Heiser [off-list ref] escreveu:
quoted
Am 06.05.21 um 12:39 schrieb Michal Suchánek:
quoted
When building HTML documentation I get this output:
...
quoted
[  412s] UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
...
quoted
It does not say which input file contains the offending character so I can't tell which file is broken.

Any idea how to debug?
I guess the build host is a very simple container, what does

    echo $LC_ALL
    echo $LANG
It's actually set to en_US just before the build.
quoted
quoted
prompt?  If it is latin, change it to something using utf-8 (I recommend
'en_US.utf8').

A UnicodeEncodeError can occour everywhere where characters are
encoded from (internal) unicode to the encoding of the stream.

By example:

A print or log statement which streams to stdout needs to encode
from unicode to stdout's encoding.  If there is one unicode symbol
which can not encoded to stream's encoding a UnicodeEncodeError
is raised.
Hi Markus,

It shouldn't matter the builder's locale when building the Kernel
documentation (or any other documents built from other git trees
on other open source projects), as the Kernel's *.rpm document charset
won't change, no matter on what part of the globe it was built.

I vaguely remember about a change we made a couple of years ago
in order to address this issue.
Hi Mauro :)

sure? .. what if the logger wants to log some symbols from the
chines translated parts to stdout and the encoding of stdout is
latin?
[  127s] + cd linux-5.12-next-20210506
[  127s] + export LANG=en_US
[  127s] + LANG=en_US
[  127s] + mkdir -p html
[  127s] + python3 -c 'print("↑ᛏ个")'
[  127s] ↑ᛏ个
[  127s] + echo 'print("↑ᛏ个")'
[  127s] + python3 test.py
[  127s] Traceback (most recent call last):
[  127s]   File "test.py", line 1, in <module>
[  127s]     print("\u2191\u16cf\u4e2a\uf8f9")
[  127s] UnicodeEncodeError: 'latin-1' codec can't encode characters in
position 0-3: ordinal not in range(256)

It certainly does not look like python can print unicode in this
environment. It tells me where the problem is, though.

Thanks

Michal

[  127s] + :
[  127s] + locale
[  128s] LANG=en_US
[  128s] LC_CTYPE="en_US"
[  128s] LC_NUMERIC="en_US"
[  128s] LC_TIME="en_US"
[  128s] LC_COLLATE="en_US"
[  128s] LC_MONETARY="en_US"
[  128s] LC_MESSAGES="en_US"
[  128s] LC_PAPER="en_US"
[  128s] LC_NAME="en_US"
[  128s] LC_ADDRESS="en_US"
[  128s] LC_TELEPHONE="en_US"
[  128s] LC_MEASUREMENT="en_US"
[  128s] LC_IDENTIFICATION="en_US"
[  128s] LC_ALL=
[  128s] + echo LC_ALL=
[  128s] LC_ALL=
[  128s] + echo LANG=en_US
[  128s] LANG=en_US
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help