Thread (38 messages) 38 messages, 4 authors, 2025-01-23

Re: [PATCH v4] man/man7/pathname.7: Add file documenting format of pathnames

From: Alejandro Colomar <alx@kernel.org>
Date: 2025-01-15 17:20:36

Hi Jason,

On Wed, Jan 15, 2025 at 11:20:51AM -0500, Jason Yundt wrote:
The goal of this new manual page is to help people create programs that
do the right thing even in the face of unusual paths.  The information
that I used to create this new manual page came from these sources:

• <https://unix.stackexchange.com/a/39179/316181>
• <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>
• <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12>
• <https://docs.kernel.org/filesystems/affs.html#mount-options-for-the-affs>
• <man:unix(7)>

Signed-off-by: Jason Yundt <redacted>
---
Here’s what I changed from the previous version:
Thanks!  The page starts looking good.  I'll make some minor comments
below.
quoted hunk ↗ jump to hunk
• The title of the page is now “pathname(7)”.
• The list of kernel rules now mentions that paths can’t be longer than
  4,096 bytes (Thanks for mentioning this, Florian).
• The list of kernel rules now mentions that filenames can’t be longer
  than 255 bytes.
• I replaced the ext4 filename limitation example with a Amiga filename
  limitation example.  It no longer made sense to say that ext4 limited
  filenames to 255 bytes now we’re saying that all filenames are limited
  to 255 bytes.
• I added UNIX domain sockets’s sun_path as an example of a situation
  where the kernel puts additional limitations on paths (Thanks for
  mentioning this, Florian).
• I added additional sources to the commit message in order to account
  for the new information added by this version.

 man/man7/pathname.7 | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 man/man7/pathname.7
diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
new file mode 100644
index 000000000..15ff98e15
--- /dev/null
+++ b/man/man7/pathname.7
@@ -0,0 +1,61 @@
+.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH pathname 7 (date) "Linux man-pages (unreleased)"
+.SH NAME
+pathname \- how pathnames are encoded and interpreted
Maybe, since this also discusses filenames, we should use both names:

	.SH NAME
	filename,
	pathname
	\-
	...
+.SH DESCRIPTION
+Some system calls allow you to pass a pathname as a parameter.
+When writing code that deals with paths,
+there are kernel space requirements that you must comply with
s/kernel space/kernel-space/

since it works as an adjective.

also, I'd put a comma after that: s/$/,/
+and userspace requirements that you should comply with.
s/userspace/user-space/

for similar reasons.
+.P
+The kernel stores paths as null-terminated byte sequences.
+The kernel has a few general rules that apply to all paths:
+.IP \[bu]
See man-pages(7):

   Lists
     There are different kinds of lists:

     [...]

     Bullet lists
            Elements  are preceded by bullet symbols (\[bu]).  Anything
            that doesn’t fit elsewhere is usually covered by this  type
            of list.

     [...]

     There should always be exactly 2 spaces between  the  list  symbol
     and  the  elements.   This  doesn’t  apply to "tagged paragraphs",
     which use the default indentation rules.

So, you'll need to use

	.IP \[bu] 3

in the first item (and only there; the following ones inherit the
value).
+The last byte in the sequence needs to be a null byte.
+.IP \[bu]
+Any other bytes in the sequence need to be non-null bytes.
+.IP \[bu]
+A 0x2F byte is always interpreted as a directory separator (/).
How about adding this?:

	and cannot be part of a filename.
+.IP \[bu]
+A path can be at most 4,096 bytes long.
For self-consistency, let's use the same term all of the time: either
path or pathname.  Otherwise, a reader might think they are different
things.

For consistency with POSIX, let's say pathname, since that's what POSIX
uses:
<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_254>
+A path that’s longer than 4,096 bytes can be split into multiple smaller paths
+and opened piecewise using
+.BR openat (2).
+.IP \[bu]
+Filenames can be at most 255 bytes long.
For consistency with bullet one:

s/Filenames/A filename/
+.P
+The kernel also has some rules that only apply in certain situations.
+Here are some examples:
+.IP \[bu]
+If you want to store a file on an Amiga filesystem,
+then its filename can’t be longer than 30 bytes.
I would simplify and make it more consistent with the bullets above:

	-  Filenames on the Amiga filesystem can be at most 30 bytes long.
+.IP \[bu]
+If you want to store a file on a vfat filesystem,
+then its filename can’t contain a 0x3A byte (: in ASCII)
Is that the only one?  I expect there are several characters that are
not allowed in vfat.
+unless the filesystem was mounted with iocharset set to something unusual.
+.IP \[bu]
+A UNIX domain socket’s sun_path can be at most 108 bytes long (see
+.BR unix (7)
+for details).
+.P
+Userspace treats paths differently.
s/Userspace/User space/
+Userspace applications typically expect paths to use
.
+a consistent character encoding.
+For maximum interoperability, programs should use
+.BR nl_langinfo (3)
+to determine the current locale’s codeset.
+Paths should be encoded and decoded using the current locale’s codeset
+in order to help prevent mojibake.
It might be interesting to add an example program.
+For maximum interoperability,
+programs and users should also limit
+the characters that they use for their own paths to characters in
+.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265
+the POSIX Portable Filename Character Set
+.UE .
+.SH SEE ALSO
+.BR open (2),
+.BR nl_langinfo (3),
+.BR path_resolution (7)
Also interesting:

	.BR mount (8)

(It talks about iocharset.)


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help