Thread (3 messages) 3 messages, 3 authors, 2021-08-27

Re: [PATCH v6 09/21] fsnotify: Allow events reported with an empty inode

From: Amir Goldstein <amir73il@gmail.com>
Date: 2021-08-26 10:45:04
Also in: linux-api, linux-fsdevel

Possibly related (same subject, not in this thread)

On Thu, Aug 26, 2021 at 12:50 AM Gabriel Krisman Bertazi
[off-list ref] wrote:
Amir Goldstein [off-list ref] writes:
quoted
On Wed, Aug 25, 2021 at 9:40 PM Gabriel Krisman Bertazi
[off-list ref] wrote:
quoted
Amir Goldstein [off-list ref] writes:
quoted
On Fri, Aug 13, 2021 at 12:41 AM Gabriel Krisman Bertazi
[off-list ref] wrote:
quoted
Some file system events (i.e. FS_ERROR) might not be associated with an
inode.  For these, it makes sense to associate them directly with the
super block of the file system they apply to.  This patch allows the
event to be reported with a NULL inode, by recovering the superblock
directly from the data field, if needed.

Signed-off-by: Gabriel Krisman Bertazi <redacted>

--
Changes since v5:
  - add fsnotify_data_sb handle to retrieve sb from the data field. (jan)
---
 fs/notify/fsnotify.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 30d422b8c0fc..536db02cb26e 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -98,6 +98,14 @@ void fsnotify_sb_delete(struct super_block *sb)
        fsnotify_clear_marks_by_sb(sb);
 }

+static struct super_block *fsnotify_data_sb(const void *data, int data_type)
+{
+       struct inode *inode = fsnotify_data_inode(data, data_type);
+       struct super_block *sb = inode ? inode->i_sb : NULL;
+
+       return sb;
+}
+
 /*
  * Given an inode, first check if we care what happens to our children.  Inotify
  * and dnotify both tell their parents about events.  If we care about any event
@@ -455,8 +463,10 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
  *             @file_name is relative to
  * @file_name: optional file name associated with event
  * @inode:     optional inode associated with event -
- *             either @dir or @inode must be non-NULL.
- *             if both are non-NULL event may be reported to both.
+ *             If @dir and @inode are NULL, @data must have a type that
+ *             allows retrieving the file system associated with this
Irrelevant comment. sb must always be available from @data.
quoted
+ *             event.  if both are non-NULL event may be reported to
+ *             both.
  * @cookie:    inotify rename cookie
  */
 int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
@@ -483,7 +493,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
                 */
                parent = dir;
        }
-       sb = inode->i_sb;
+       sb = inode ? inode->i_sb : fsnotify_data_sb(data, data_type);
        const struct path *path = fsnotify_data_path(data, data_type);
+       const struct super_block *sb = fsnotify_data_sb(data, data_type);

All the games with @data @inode and @dir args are irrelevant to this.
sb should always be available from @data and it does not matter
if fsnotify_data_inode() is the same as @inode, @dir or neither.
All those inodes are anyway on the same sb.
Hi Amir,

I think this is actually necessary.  I could identify at least one event
(FS_CREATE | FS_ISDIR) where fsnotify is invoked with a NULL data field.
In that case, fsnotify_dirent is called with a negative dentry from
vfs_mkdir().  I'm not sure why exactly the dentry is negative after the
That doesn't sound right at all.
Are you sure about this?
Which filesystem was this mkdir called on?
You should be able to reproduce it on top of mainline if you pick only this
patch and do the change you suggested:

 -       sb = inode->i_sb;
 +       sb = fsnotify_data_sb(data, data_type);

And then boot a Debian stable with systemd.  The notification happens on
the cgroup pseudo-filesystem (/sys/fs/cgroup), which is being monitored
by systemd itself.  The event that arrives with a NULL data is telling the
directory /sys/fs/cgroup/*/ about the creation of directory
`init.scope`.

The change above triggers the following null dereference of struct
super_block, since data is NULL.

I will keep looking but you might be able to answer it immediately...
Yes, I see what is going on.

cgroupfs is a sort of kernfs and kernfs_iop_mkdir() does not instantiate
the negative dentry. Instead, kernfs_dop_revalidate() always invalidates
negative dentries to force re-lookup to find the inode.

Documentation/filesystems/vfs.rst says on create() and friends:
"...you will probably call d_instantiate() with the dentry and the
  newly created inode..."

So this behavior seems legit.
Meaning that we have made a wrong assumption in fsnotify_create()
and fsnotify_mkdir().
Please note the comment above fsnotify_link() which anticipates
negative dentries.

I've audited the fsnotify backends and it seems that the
WARN_ON(!inode) in kernel/audit_* is the only immediate implication
of negative dentry with FS_CREATE.
I am the one who added these WARN_ON(), so I will remove them.
I think that missing inode in an FS_CREATE event really breaks
audit on kernfs, but not sure if that is a valid use case (Paul?).

Anyway, regarding your patch, I still prefer the solution proposed by Jan,
but not with a different implementation of fsnotify_data_sb().

Please see branch fsnotify_data_sb[1] with the proposed fixes.
The fixes assert the statement that "sb should always be available
from @data", regardless of kernfs anomaly.

If this works for you, please prepend those patches to your next
submission.

Regarding the state of this patch set in general, I must admit that
I wasn't able to follow if a conclusion was reached about the lifetime
management of fsnotify_error_event and associated sb mark.
Jan is going out on vacation and I think there is little point in spinning
another patch set revision before this issue is settled with Jan.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fsnotify_data_sb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help