Re: [PATCH] kernfs: also call kernfs_set_rev() for positive dentry
From: Ian Kent <raven@themaw.net>
Date: 2021-09-29 10:09:52
Subsystem:
filesystems (vfs and infrastructure), kernfs, the rest · Maintainers:
Alexander Viro, Christian Brauner, Greg Kroah-Hartman, Tejun Heo, Linus Torvalds
On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote:
A KMSAN warning is reported by Alexander Potapenko:
I haven't looked at this thoroughly yet I admit but ... I'm very sorry but again I don't quite agree with what's done in the patch.
BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 d_revalidate fs/namei.c:854 lookup_dcache fs/namei.c:1522 __lookup_hash+0x3a6/0x590 fs/namei.c:1543 filename_create+0x312/0x7c0 fs/namei.c:3657 do_mkdirat+0x103/0x930 fs/namei.c:3900 __do_sys_mkdir fs/namei.c:3931 __se_sys_mkdir fs/namei.c:3929 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 do_syscall_x64 arch/x86/entry/common.c:51 It seems a positive dentry in kernfs becomes a negative dentry directly through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized when accessing it in kernfs_dop_revalidate(), because it is only initialized when created as negative dentry in kernfs_iop_lookup().
It looks more like the final dput() that's sending the dentry to the LRU list than the call to d_delete(). But I could be mistaken and it's just a detail. I think the real question is do we want kernfs ->rmdir() to utilize the VFS negative dentry facility. Clearly, based in the kernfs ->rmdir() op function definition kernfs doesn't want file systems that use it to be able to decide that. Unless there is a case for allowing dentries to be re-used I think they should be dropped in kernfs_iop_rmdir().
The problem can be reproduced by the following command: cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi
And immediately recreating the directory is probably not a common use case so it doesn't provide a case for kernfs ->rmdir() not dropping the dentry to prevent the dentry making it to the LRU negative dentry list. The more likely case is an unnecessary accumulation of negative dentries from file system activity with occasional negative dentry re-use which might be undesirable for long running systems.
A simple fixes seems to be initializing d->d_time for positive dentry in kernfs_iop_lookup() as well. The downside is the negative dentry will be revalidated again after it becomes negative in d_delete(), because the revison of its parent must have been increased due to its removal.
I did that for a long time is the series before deciding to do it only for negatives ... The usual case of directory removal is the dentry is dropped by the ->rmdir() function either directly or via d_invalidate() to prevent possible incorrect initialization on re-create. But this is probably unlikely for kernfs attributes of the same name so I'm not certain dropping the dentry is in fact what really should be done here. I admit I did miss this case.
Alternative solution is implement .d_iput for kernfs, and assign d_time for the newly-generated negative dentry in it. But we may need to take kernfs_rwsem to protect again the concurrent kernfs_link_sibling() on the parent directory, it is a little over-killing. Now the simple fix is chosen.
Please don't even consider this, it's misleading because here we are concerned with the dentry not the inode so I don't think adding ->d_iput() is the right thing to do.
quoted hunk ↗ jump to hunk
Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Hou Tao <redacted> --- fs/kernfs/dir.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 0c7f1558f489..f7618ba5b3b2 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c@@ -1140,8 +1140,13 @@ static struct dentry *kernfs_iop_lookup(structinode *dir, if (!inode) inode = ERR_PTR(-ENOMEM); } - /* Needed only for negative dentry validation */ - if (!inode) + /* + * Needed for negative dentry validation. + * The negative dentry can be created in kernfs_iop_lookup() + * or transforms from positive dentry in dentry_unlink_inode() + * called from vfs_rmdir(). + */ + if (!IS_ERR(inode)) kernfs_set_rev(parent, dentry); up_read(&kernfs_rwsem);
This is not a bad idea to do and is what I did for quite a while too but maybe it would be better to do this:
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a957c944cf3a..3fd9d8fa4b92 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c@@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry) return -ENODEV; ret = scops->rmdir(kn); + if (!ret) + d_invalidate(dentry); kernfs_put_active(kn); return ret;