Thread (16 messages) 16 messages, 3 authors, 2025-01-28

Re: man/man7/pathname.7: Correct handling of pathnames

From: Jason Yundt <hidden>
Date: 2025-01-27 17:14:45

On Mon, Jan 27, 2025 at 04:53:10PM +0100, Alejandro Colomar wrote:
Right.  But then, when do you need to do encoding?
Personally, my preference is that programs use the locale’s codeset
because I can override the locale codeset in the rare event that UTF-8
isn’t the correct option.  In my previous example, I was able to set the
LANG environment variable to jp_JP.SJIS so that I could run that old
software in an environment where pathnames were encoded in Shift-JIS.
If everything just always assumed a particular character encoding for
pathnames, then I wouldn’t have been able to do that.

That being said, I still don’t really know if that’s the best option.
Programs will either receive the pathname from the command line, or
read it from some file, or create one of its own.

When creating a path of its own, it should restrict itself to the
Portable Filename Character Set, so encoding shouldn't be a problem.

When reading pathnames, they'll already be encoded suitably.
quoted
quoted
Instead, I think a good recommendation would be to behave in one of the
following ways:

-  Accept only the POSIX Portable Filename Character Set.
This one isn’t quite a complete recommendation.  The POSIX Portable
Filename Character Set is just a character set.  It’s not a character
encoding.  If we go with this one, then we would need to say something
along the lines of “Encode and decode paths using ASCII and only accept
characters that are in the POSIX Protable Filename Character Set.”
quoted
-  Assume UTF-8, but reject control characters.
-  Assume UTF-8.
quoted
-  Accept anything, but reject control characters.
-  Accept anything, just like the kernel.
These last two also aren’t quite complete recommendations.  If a GUI
program wants to display a pathname on the screen, then what character
encoding should it use when decoding the bytes?
Just print them as they got in.  No decoding.  Send the raw bytes to
write(2) or printf(3) or whatever.
I don’t think that printing is a good way for GUI applications to
display text.  I don’t normally run GUI applications in a terminal, so
I’m not normally able to see a GUI application’s stdout or stderr.  Most
of the GUI applications that I use display pathnames as part of a larger
window.  In order to do that, the GUI application needs to know which
characters the bytes in the pathname represent so that the GUI
application can draw those characters on the screen.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help