Thread (38 messages) 38 messages, 4 authors, 2025-01-23

Re: [PATCH v2] man/man7/path-format.7: Add file documenting format of pathnames

From: Jason Yundt <hidden>
Date: 2025-01-15 16:21:10

On Wed, Jan 15, 2025 at 12:06:10AM +0100, Alejandro Colomar wrote:
Hmmm, yep, let's make it pathname(7).
OK, I’ll submit a new version that uses pathname(7) as the title.
Makes sense.  How about a null-terminated string?
The term null-terminated string still has some of the problems that I
mentioned earlier.  Specifically, people think of null-terminated
strings as sequences of characters.  It’s easier to understand how the
kernel handles paths if you think of paths as sequences of bytes, not as
sequences of characters.

Also, people typically make assumptions about the encoding of
null-terminated strings in the C programming language.  It’s reasonable
to assume that a char * is encoded in the execution character set, that
a wchar_t * is encoded in the wide execution character set, that a
char8_t * is encoded in UTF-8, that a char16_t * is encoded in UTF-16
and that a char32_t * is encoded in UTF-32.  Paths don’t necessarily
have one character encoding, and their character encoding may not be any
of those.
quoted
I have a concern about programs failing hard when paths contain
non-ASCII characters.  I have a lot of songs and medleys saved on my
computer.  The paths for over 10,000 of them contain non-ASCII
characters.  Most of those non-ASCII characters come from Chinese,
Japanese or Korean characters that are in the titles of songs or
medleys.  If programs failed hard on paths that contain non-ASCII
characters, what impact would that have on my music collection?
The core utils (e.g., rm(1) et al.) are nice and work well for arbitrary
characters, to allow you to fix them.  But yeah, most high level
programs and (especially) scripts aren't so nice.  Think for example of
makefiles, where handling files with spaces correctly is almost
impossible.
I agree that the core utils work well with arbitrary paths.  I’m not so
sure that most high level programs and scripts don’t work well with
spaces and non-ASCII characters.  Most of the high level programs and
scripts that I personally use work fine with paths that contain spaces
and non-ASCII characters, but I don’t know if most programs and scripts
in general work that well.  I also agree that handling spaces correctly
in makefiles is almost impossible which is why I don’t use makefiles for
my own personal projects.

That being said, I think that you misunderstood my two questions.  You
told me the current state of things.  I’m not asking about the current
state of things, I’m asking about a hypothetical future where programs
started to “assume the Portable Filename Character Set (or at most some
subset of ASCII), and fail hard outside of that”.  If we start making
that recommendation and programs start following that recommendation,
then it sounds like I wouldn’t be able to do anything with a large part
of my music collection, and it sounds like I wouldn’t be able to use the
symbolic links that are in my /dev/disks/by-partlabel directory.  Am I
understanding your recommendation correctly?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help