Re: [PATCH v4] man/man7/pathname.7: Add file documenting format of pathnames
From: Alejandro Colomar <alx@kernel.org>
Date: 2025-01-15 17:20:36
Hi Jason, On Wed, Jan 15, 2025 at 11:20:51AM -0500, Jason Yundt wrote:
The goal of this new manual page is to help people create programs that do the right thing even in the face of unusual paths. The information that I used to create this new manual page came from these sources: • <https://unix.stackexchange.com/a/39179/316181> • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12> • <https://docs.kernel.org/filesystems/affs.html#mount-options-for-the-affs> • <man:unix(7)> Signed-off-by: Jason Yundt <redacted> --- Here’s what I changed from the previous version:
Thanks! The page starts looking good. I'll make some minor comments below.
quoted hunk ↗ jump to hunk
• The title of the page is now “pathname(7)”. • The list of kernel rules now mentions that paths can’t be longer than 4,096 bytes (Thanks for mentioning this, Florian). • The list of kernel rules now mentions that filenames can’t be longer than 255 bytes. • I replaced the ext4 filename limitation example with a Amiga filename limitation example. It no longer made sense to say that ext4 limited filenames to 255 bytes now we’re saying that all filenames are limited to 255 bytes. • I added UNIX domain sockets’s sun_path as an example of a situation where the kernel puts additional limitations on paths (Thanks for mentioning this, Florian). • I added additional sources to the commit message in order to account for the new information added by this version. man/man7/pathname.7 | 61 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 man/man7/pathname.7diff --git a/man/man7/pathname.7 b/man/man7/pathname.7 new file mode 100644 index 000000000..15ff98e15 --- /dev/null +++ b/man/man7/pathname.7@@ -0,0 +1,61 @@ +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH pathname 7 (date) "Linux man-pages (unreleased)" +.SH NAME +pathname \- how pathnames are encoded and interpreted
Maybe, since this also discusses filenames, we should use both names: .SH NAME filename, pathname \- ...
+.SH DESCRIPTION +Some system calls allow you to pass a pathname as a parameter. +When writing code that deals with paths, +there are kernel space requirements that you must comply with
s/kernel space/kernel-space/ since it works as an adjective. also, I'd put a comma after that: s/$/,/
+and userspace requirements that you should comply with.
s/userspace/user-space/ for similar reasons.
+.P +The kernel stores paths as null-terminated byte sequences. +The kernel has a few general rules that apply to all paths: +.IP \[bu]
See man-pages(7):
Lists
There are different kinds of lists:
[...]
Bullet lists
Elements are preceded by bullet symbols (\[bu]). Anything
that doesn’t fit elsewhere is usually covered by this type
of list.
[...]
There should always be exactly 2 spaces between the list symbol
and the elements. This doesn’t apply to "tagged paragraphs",
which use the default indentation rules.
So, you'll need to use
.IP \[bu] 3
in the first item (and only there; the following ones inherit the
value).
+The last byte in the sequence needs to be a null byte. +.IP \[bu] +Any other bytes in the sequence need to be non-null bytes. +.IP \[bu] +A 0x2F byte is always interpreted as a directory separator (/).
How about adding this?: and cannot be part of a filename.
+.IP \[bu] +A path can be at most 4,096 bytes long.
For self-consistency, let's use the same term all of the time: either path or pathname. Otherwise, a reader might think they are different things. For consistency with POSIX, let's say pathname, since that's what POSIX uses: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_254>
+A path that’s longer than 4,096 bytes can be split into multiple smaller paths +and opened piecewise using +.BR openat (2). +.IP \[bu] +Filenames can be at most 255 bytes long.
For consistency with bullet one: s/Filenames/A filename/
+.P +The kernel also has some rules that only apply in certain situations. +Here are some examples: +.IP \[bu] +If you want to store a file on an Amiga filesystem, +then its filename can’t be longer than 30 bytes.
I would simplify and make it more consistent with the bullets above: - Filenames on the Amiga filesystem can be at most 30 bytes long.
+.IP \[bu] +If you want to store a file on a vfat filesystem, +then its filename can’t contain a 0x3A byte (: in ASCII)
Is that the only one? I expect there are several characters that are not allowed in vfat.
+unless the filesystem was mounted with iocharset set to something unusual. +.IP \[bu] +A UNIX domain socket’s sun_path can be at most 108 bytes long (see +.BR unix (7) +for details). +.P +Userspace treats paths differently.
s/Userspace/User space/
+Userspace applications typically expect paths to use
.
+a consistent character encoding. +For maximum interoperability, programs should use +.BR nl_langinfo (3) +to determine the current locale’s codeset. +Paths should be encoded and decoded using the current locale’s codeset +in order to help prevent mojibake.
It might be interesting to add an example program.
+For maximum interoperability, +programs and users should also limit +the characters that they use for their own paths to characters in +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 +the POSIX Portable Filename Character Set +.UE . +.SH SEE ALSO +.BR open (2), +.BR nl_langinfo (3), +.BR path_resolution (7)
Also interesting: .BR mount (8) (It talks about iocharset.) Cheers, Alex -- <https://www.alejandro-colomar.es/>
Attachments
- signature.asc [application/pgp-signature] 833 bytes