Re: [RFC PATCH v5 1/2] vfs: add O_CREAT|O_DIRECTORY to open*(2)

From: NeilBrown <hidden>
Date: 2026-06-03 22:56:22
Also in: linux-fsdevel, lkml

On Wed, 03 Jun 2026, Jori Koolstra wrote:

quoted

Op 02-06-2026 17:44 CEST schreef Christian Brauner [off-list ref]:

Yes, I agree. This would change error codes but I don't think it
matters:

* O_WRONLY | O_DIRECTORY on non-directory -> ENOTDIR
* O_WRONLY | O_DIRECTORY on     directory -> EISDIR

I don't think that really matters and we should be able to collapse this
to ENOTDIR.

I will pick this up in the next version of O_CREAT|O_DIRECTORY. I think that
makes most sense.

I have an outstanding patch for changing the EACCES/EPERM to EOPNOTSUPP;
Jeff and Jan were skeptical, but I want to know your opinion as well.
I feel the the scenario where userspace has no fall-through but does
handle every single -E listed in the man-page quite unlikely, so I say
lets change them and we'll hear from them if somehow someone relied on
this weird way of error handling.

Please cc linux-api@vger.kernel.org on code and discussions that involve
API changes.  I have cc:ed them on this reply.

Thanks,
NeilBrown

quoted

There is another point, I maybe should have mentioned in the cover letter: I have not attempted
to handle dangling symlinks for O_MKDIR. Not because I think they are a great idea (as Aleksa
has mentioned, but I am not very familiar with the dragons it entails), but I wanted to discuss
what behavior we want in this case. Do we say that we never do a mkdir after following a lookup
last symlink? I don't think that state is even recorded right now.

I think the state might be recorded in nd->depth.  But you probably
don't want to use that directly.  Maybe forcing LOOKUP_FOLLOW to be
cleared if O_CREAT|O_DIRECTORY is set would be good.  But what would
stop you opening an existing directory through a symlink....

Probably we need a clear statement of intended semantics which we can
review, agree on, then implement.  Have you looked at preparing a patch
for man-pages to document the change in behaviour for openat etc?

Ugh, dangling symlinks. Actually, scratch that: Ugh, symlinks. So
O_CREAT without O_NOFOLLOW allows you to create the target of a dangling
symlink iirc. I always forget that. I think this is a very subtle bug
and maybe - with both eyes closed - a feature at times.

We should straighten the behavior for O_DIRECTORY | O_CREAT and we
agreed on that during LSFMM. It would be nice if we could get away with
simply implying O_NOFOLLOW but I think you're right, Neil, that this
prevents a valid O_CREAT | O_DIRECTORY on an existing directory which we
can't do. Makes this kind of a pointless excercise.

But this shouldn't be all that crazy to do right. Using the O_CREAT as
an _example_ for what we'd need:

    fs: refuse O_CREAT through a dangling symlink

    open(O_CREAT) without O_EXCL follows a trailing symlink and, when the
    symlink target does not exist, creates it.  Refuse to create through a
    dangling symlink instead.

    In lookup_open() a negative target reached with nd->depth > 0 was
    arrived at by following a trailing symlink; since the dentry is negative
    the symlink is dangling. Set create_error to -ELOOP in that case.
    Reusing the existing create_error path strips O_CREAT for both the
    generic and ->atomic_open create paths and only reports the error when
    the target is actually negative, so opening an existing target through a
    symlink, interior symlinks, and O_EXCL (which never follows the trailing
    link) are all unaffected.

    Hastily-Cobbled-Together-by: Christian Brauner (Amutable) [off-list ref]

diff --git a/fs/namei.c b/fs/namei.c
index c7fac83c9a85..d20bbcc7e8d3 100644
--- a/fs/namei.c
+++ b/fs/namei.c

@@ -4468,6 +4468,9 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
                                                    dentry, mode);
                else
                        create_error = -EROFS;
+               /* refuse to create through a dangling (trailing) symlink */
+               if (unlikely(nd->depth) && !create_error)
+                       create_error = -ELOOP;
        }
        if (create_error)
                open_flag &= ~O_CREAT;

It can't be that easy...

This is what I suggested above, correct, in terms of behavior?

In terms of the patch, I think this will work, but struct nameidata could really
use some commentary for its fields. I spent the last two hours verifying that
nd->depth really does what I thought it did, and I am still not 100% positive.
AFAIS, nd->depth indeed tracks the current symlink depth, which outside of
link_path_walk() reduces to the number of trailing links followed.

But if Neil's rework of lookup_open() is merged we lose access here to nd.
@Neil, have you thought about what would be a good way to resolve that?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help