Thread (38 messages) 38 messages, 4 authors, 2025-01-23

Re: [PATCH] man/man7/path-format.7: Add file documenting format of pathnames

From: Alejandro Colomar <alx@kernel.org>
Date: 2025-01-14 00:20:24

Hi Jason,

On Mon, Jan 13, 2025 at 04:32:46PM -0500, Jason Yundt wrote:
quoted hunk ↗ jump to hunk
The goal of this new manual page is to help people create programs that
do the right thing even in the face of unusual paths.  The information
that I used to create this new manual page came from this Unix & Linux
Stack Exchange answer [1] and from this Libc-help mailing list post [2].

[1]: <https://unix.stackexchange.com/a/39179/316181>
[2]: <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>

Signed-off-by: Jason Yundt <redacted>
---
 man/man7/path-format.7 | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 man/man7/path-format.7
diff --git a/man/man7/path-format.7 b/man/man7/path-format.7
new file mode 100644
index 000000000..c3c01cbf5
--- /dev/null
+++ b/man/man7/path-format.7
@@ -0,0 +1,41 @@
+.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH PATH-FORMAT 7 (date) "Linux man-pages (unreleased)"
+.SH NAME
+path-format \- how pathnames are encoded and interpreted
I would use path_format instead of path-format or PATH-FORMAT.
+.SH DESCRIPTION
+Some system calls allow you to pass a pathname as a parameter.
+When writing code that deals with paths,
+there are kernel space requirements that you must comply with
+and userspace requirements that you should comply with.
+.P
+The kernel stores paths as null-terminated byte sequences.
+As far as the kernel is concerned, there are only three rules for paths:
+.IP \[bu]
+The last byte in the sequence needs to be a null.
+.IP \[bu]
+Any other bytes in the sequence need to not be null bytes.
... need to be non-null bytes.

seems easier to read.
+.IP \[bu]
+A 0x2F byte is always interpreted as a directory separator (/).
+.P
+This means that programs can technically do weird things
+like create paths using random character encodings
+or create paths without using any character encoding at all.
+Filesystems may impose additional restrictions on paths, though.
+For example, if you want to store a file on an ext4 filesystem,
+then its filename can’t be longer than 255 bytes.
+.P
+Userspace treats paths differently.
+Userspace applications typically expect paths to use
+a consistent character encoding.
+For maximum interoperability, programs should use
+.BR nl_langinfo (3)
+to determine the current locale’s codeset.
I would say that for maximum interoperability one should self-limit to
the POSIX Portable Filename Character Set:
<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265>


Have a lovely night!
Alex
+Paths should be encoded and decoded using the current locale’s codeset
+in order to help prevent mojibake.
+.SH SEE ALSO
+.BR open (2),
+.BR nl_langinfo (3),
+.BR path_resolution (7)
-- 
2.47.0
-- 
<https://www.alejandro-colomar.es/>

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help