Thread (41 messages) 41 messages, 5 authors, 2021-05-17

Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)

From: Michal Suchánek <hidden>
Date: 2021-05-12 07:02:01

On Wed, May 12, 2021 at 08:22:38AM +0200, Mauro Carvalho Chehab wrote:
Hi Michal,

Em Thu, 6 May 2021 19:48:49 +0200
Michal Suchánek [off-list ref] escreveu:
quoted
[  127s] + :
[  127s] + locale
[  128s] LANG=en_US
[  128s] LC_CTYPE="en_US"
[  128s] LC_NUMERIC="en_US"
[  128s] LC_TIME="en_US"
[  128s] LC_COLLATE="en_US"
[  128s] LC_MONETARY="en_US"
[  128s] LC_MESSAGES="en_US"
[  128s] LC_PAPER="en_US"
[  128s] LC_NAME="en_US"
[  128s] LC_ADDRESS="en_US"
[  128s] LC_TELEPHONE="en_US"
[  128s] LC_MEASUREMENT="en_US"
[  128s] LC_IDENTIFICATION="en_US"
[  128s] LC_ALL=
[  128s] + echo LC_ALL=
[  128s] LC_ALL=
[  128s] + echo LANG=en_US
[  128s] LANG=en_US
Where those the locale settings that you used when the build
failed?

I tried to reproduce the bug here with, disabling the parallel run (as
it masks the real error) with both:

	$ for i in LANG LC_ALL LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE LC_TIME; do echo $i=en_US; done
	$ make cleandocs && make SPHINXOPTS=-j1 htmldocs

(this one caused lots of warnings on Debian, due to the
 settings at /etc/locale.gen)

and:

	$ for i in LANG LC_ALL LC_ADDRESS LC_IDENTIFICATION LC_MEASUREMENT LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE LC_TIME; do echo $i=en_US.ISO-8859-1; done
	$ make cleandocs && make SPHINXOPTS=-j1 htmldocs

Without any success.

Could you please provide more details about the build VM and the git 
changeset that caused the issue?
It depends on what character set your en_US locale implements.

~> cat test.py 
print("↑ᛏ个")
~> locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
~> python3 test.py 
↑ᛏ个
~> LANG=en_US python3 test.py 
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    print("\u2191\u16cf\u4e2a\uf8f9")
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)
~> LANG=C python3 test.py 
↑ᛏ个

You can easily test if your python version can print UTF-8 in a specific
locale, and if necessary define an ISO-8859-1 locale for testing.
On some systems the situation is reversed - C locale is ASCII only, and
en_US is UTF-8, and it is possible that some systems don't ship an 8bit
locale at all.

Thanks

Michal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help