Thread (24 messages) 24 messages, 5 authors, 2025-11-22

Re: [PATCH v2] man/man3/readdir.3, man/man3type/stat.3type: Improve documentation about .d_ino and .st_ino

From: Pali Rohár <pali@kernel.org>
Date: 2025-11-22 00:53:37

On Friday 21 November 2025 17:39:57 G. Branden Robinson wrote:
Hi Pali,

At 2025-11-21T22:10:28+0100, Pali Rohár wrote:
quoted
On Wednesday 29 October 2025 20:34:13 Pali Rohár wrote:
quoted
On Wednesday 29 October 2025 02:00:39 G. Branden Robinson wrote:
quoted
At 2025-10-29T00:53:06+0100, Pali Rohár wrote:
quoted
If you are referring to the "bug" then it is written in
informative part in RATIONALE section of readdir / POSIX.1-2024.
I wrote in my first email in that email thread which Alejandro
linked above.

Here is direct link to POSIX spec and below is quoted part:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/readdir.html

"When returning a directory entry for the root of a mounted file
system, some historical implementations of readdir() returned
the file serial number of the underlying mount point, rather
than of the root of the mounted file system. This behavior is
considered to be a bug, since the underlying file serial number
has no significance to applications."
[...]
quoted
quoted
quoted
quoted
That part is in the "informative" section. I have not found
anything in normative sections which would disallow usage of
that "historical" behavior, so my understanding was that
"historical" behavior is conforming too.

Please correct me if I'm wrong here, or if it should be
understood in different way.
I can't speak for the Austin Group, but I don't read the text
quite the same way.  I interpret it as saying that some historical
implementations of readdir() would _not_ "return a pointer to a
structure representing the directory entry at the current position
in the directory stream specified by the argument dirp, and
position the directory stream at the next entry."  But I suspect
that's not what it _intends_ to say.

Instead, these implementations "returned [sic] the file serial
number of the underlying mount point", which I interpret to mean
that they would return a pointer to a _dirent_ struct whose
_d_ino_ member was not the file serial number of the file (of
directory type) named by the _d_name_ member but a pointer to a
_dirent_ struct whose _d_ino_ member was the file serial number of
the underlying mount point.

I think there are two conclusions we can reach here.

1.  POSIX.1-2024 might be a little sloppy in the wording of its
    "RATIONALE" for this interface.  Presumably no historical
    implementation's readdir() returned a _d_ino_ number directly.
    (Though with all the exuberant integer/pointer punning that
    used to go in Unix, I'd wouldn't bet a lot of money that *no*
    implementation ever did.)  I'll wager a nickel that readdir()
    has always, on every implementation, returned a pointer to a
    _dirent_ struct, and it is only the value of the _d_ino_
    member of the pointed-to struct that some implementations have
    populated inconsistently when the entry is a directory that is
    a mount point.

    If I'm right, this is an example of the common linguistic
    error of synecdoche: confusing a container with (a subset of)
    its contents.

2.  The behavior POSIX describes as buggy is, in fact,
    nonconforming.
Only two? I can image that somebody come up with another conclusion.
(just a joke)
I wouldn't bet against your joke proving out in reality.  ;-)
quoted
quoted
Anyway, I think that it is important to document the existing Linux
behavior and whether it is POSIX-conforming or not is then second
step.  We can drop the information about POSIX conformity from
manpage until we figure out how it is.
quoted
quoted
Also I have not read all those 4000 pages,
Pity the person who has.  :)  And mastery of all 4000+ pages should not
be necessary for an implementor to make sense of a reference entry for a
single function, command, or data object.
quoted
quoted
quoted
quoted
so maybe there is something hidden. It is quite hard to find
information about this topic and that is why I think this should
be documented in Linux manpages.
I reckon someone should open a Mantis ticket with the Austin
Group's issue tracker to get clarity on what I characterized as
"sloppy" wording.  Either it is and we can get the standard
clarified, or I'm wrong and an authority can point out how.
(Maybe both!)

I'm subscribed to the austin-group-l reflector and will take an
action item to file this ticket.  I'll try to do within a week.
(I have a lot of old Unix books and would like to rummage around
in them for any documented land mines in this area.)
[...]
quoted
quoted
Thanks for taking that part. It would be really nice if austin group
can clarify how the whole situation is in a non-confusing way.

Anyway, inode number is always connected to the specific mounted
filesystem. So when the application is doing something with inodes,
it always needs a pair (dev_t, ino_t) unless inodes belongs to same
fs dev.

readdir() and getdents() returns just ino_t, and without knowledge
of dev_t, applications cannot use returned ino_t for anything
useful.  On "historical" implementations, the dev_t can be fetched
for example by one fstat(dir_fd, &st) call as dev_t would be same
for all readdir and getdents entries. But on non-"historical"
implementation, it would be needed to call stat() on every one
entry. For example /mnt/ directory which usually contains just
mountpoints, will contain entries where each one has inode number 2
(common inode number for root of fs).

I looked into archives and I have found that this problem was
already discussed in the past. Here are some email threads from
coreutils:
https://lore.kernel.org/lkml/87y6oyhkz8.fsf@meyering.net/t/#u (local)
https://public-inbox.org/bug-coreutils/8763c5wcgn.fsf@meyering.net/t/#u
https://public-inbox.org/bug-coreutils/87iqvi2j0q.fsf@rho.meyering.net/t/#u
https://public-inbox.org/bug-coreutils/87verkborm.fsf@rho.meyering.net/
https://public-inbox.org/bug-coreutils/022320061637.4398.43FDE4D7000110830000112E22007507440A050E040D0C079D0A@comcast.net/

Maybe they could be a good reference for future discussion by austin
group.

Just my personal idea: If there would be some xgetdents syscall
(like there statx over stat), it could return both inode numbers
with dev_t and application can take which it wants.

For example, NFS4's readdir can return both inode numbers (depending
what is client asking). NFSv4.1 spec has nicely documented this
problem with UNIX background of mount point crossing:
https://www.rfc-editor.org/rfc/rfc8881.html#section-5.8.2.23

Pali
Hello Branden, did you have a time fill a ticket to austin group?
Not yet--I procrastinated and got preoccupied by exciting new
undefined or ambiguously interpretable behavior of GNU troff.

https://www.mail-archive.com/groff@gnu.org/msg20834.html
quoted
If the ticket system is public, could you send a link for reference?
It is public...

https://austingroupbugs.net/view_all_bug_page.php

...but to file a ticket or comment on one, I believe you need to create
an account.  If you file a ticket yourself because you tire of waiting
on me (which I'll understand), please let me know when you do so I can
take this item off my to do list.

Regards,
Branden
You are experienced with austin group, so I will let this to you.
I'm fine with waiting here.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help