Thread (3 messages) 3 messages, 2 authors, 2021-10-04

Re: [PATCH] kernfs: also call kernfs_set_rev() for positive dentry

From: Ian Kent <raven@themaw.net>
Date: 2021-09-29 10:09:52
Subsystem: filesystems (vfs and infrastructure), kernfs, the rest · Maintainers: Alexander Viro, Christian Brauner, Greg Kroah-Hartman, Tejun Heo, Linus Torvalds

On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote:
A KMSAN warning is reported by Alexander Potapenko:
I haven't looked at this thoroughly yet I admit but ...

I'm very sorry but again I don't quite agree with what's done
in the patch.
BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
fs/kernfs/dir.c:1053
 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
 d_revalidate fs/namei.c:854
 lookup_dcache fs/namei.c:1522
 __lookup_hash+0x3a6/0x590 fs/namei.c:1543
 filename_create+0x312/0x7c0 fs/namei.c:3657
 do_mkdirat+0x103/0x930 fs/namei.c:3900
 __do_sys_mkdir fs/namei.c:3931
 __se_sys_mkdir fs/namei.c:3929
 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
 do_syscall_x64 arch/x86/entry/common.c:51

It seems a positive dentry in kernfs becomes a negative dentry directly
through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
when accessing it in kernfs_dop_revalidate(), because it is only
initialized when created as negative dentry in kernfs_iop_lookup().
It looks more like the final dput() that's sending the dentry to
the LRU list than the call to d_delete().

But I could be mistaken and it's just a detail.

I think the real question is do we want kernfs ->rmdir() to utilize
the VFS negative dentry facility.

Clearly, based in the kernfs ->rmdir() op function definition kernfs
doesn't want file systems that use it to be able to decide that.

Unless there is a case for allowing dentries to be re-used I think
they should be dropped in kernfs_iop_rmdir().
The problem can be reproduced by the following command:

  cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi
And immediately recreating the directory is probably not a common
use case so it doesn't provide a case for kernfs ->rmdir() not
dropping the dentry to prevent the dentry making it to the LRU
negative dentry list.

The more likely case is an unnecessary accumulation of negative
dentries from file system activity with occasional negative dentry
re-use which might be undesirable for long running systems.
A simple fixes seems to be initializing d->d_time for positive dentry
in kernfs_iop_lookup() as well. The downside is the negative dentry
will be revalidated again after it becomes negative in d_delete(),
because the revison of its parent must have been increased due to
its removal.
I did that for a long time is the series before deciding to do
it only for negatives ...

The usual case of directory removal is the dentry is dropped by
the ->rmdir() function either directly or via d_invalidate() to
prevent possible incorrect initialization on re-create.

But this is probably unlikely for kernfs attributes of the same
name so I'm not certain dropping the dentry is in fact what really
should be done here.

I admit I did miss this case.
Alternative solution is implement .d_iput for kernfs, and assign d_time
for the newly-generated negative dentry in it. But we may need to
take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
on the parent directory, it is a little over-killing. Now the simple
fix is chosen.
Please don't even consider this, it's misleading because here we
are concerned with the dentry not the inode so I don't think adding
->d_iput() is the right thing to do.
quoted hunk ↗ jump to hunk
Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching")
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Hou Tao <redacted>
---
 fs/kernfs/dir.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 0c7f1558f489..f7618ba5b3b2 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1140,8 +1140,13 @@ static struct dentry *kernfs_iop_lookup(struct
inode *dir,
                if (!inode)
                        inode = ERR_PTR(-ENOMEM);
        }
-       /* Needed only for negative dentry validation */
-       if (!inode)
+       /*
+        * Needed for negative dentry validation.
+        * The negative dentry can be created in kernfs_iop_lookup()
+        * or transforms from positive dentry in dentry_unlink_inode()
+        * called from vfs_rmdir().
+        */
+       if (!IS_ERR(inode))
                kernfs_set_rev(parent, dentry);
        up_read(&kernfs_rwsem);
 
This is not a bad idea to do and is what I did for quite a while too
but maybe it would be better to do this:
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a957c944cf3a..3fd9d8fa4b92 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry)
                return -ENODEV;
 
        ret = scops->rmdir(kn);
+       if (!ret)
+               d_invalidate(dentry);
 
        kernfs_put_active(kn);
        return ret;

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help