Thread (97 messages) 97 messages, 14 authors, 2022-11-03

Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

From: Fuad Tabba <hidden>
Date: 2022-10-03 08:35:07
Also in: kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

Hi

On Fri, Sep 30, 2022 at 5:23 PM Kirill A . Shutemov
[off-list ref] wrote:
On Fri, Sep 30, 2022 at 05:14:00PM +0100, Fuad Tabba wrote:
quoted
Hi,

<...>
quoted
diff --git a/mm/memfd_inaccessible.c b/mm/memfd_inaccessible.c
new file mode 100644
index 000000000000..2d33cbdd9282
--- /dev/null
+++ b/mm/memfd_inaccessible.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "linux/sbitmap.h"
+#include <linux/memfd.h>
+#include <linux/pagemap.h>
+#include <linux/pseudo_fs.h>
+#include <linux/shmem_fs.h>
+#include <uapi/linux/falloc.h>
+#include <uapi/linux/magic.h>
+
+struct inaccessible_data {
+       struct mutex lock;
+       struct file *memfd;
+       struct list_head notifiers;
+};
+
+static void inaccessible_notifier_invalidate(struct inaccessible_data *data,
+                                pgoff_t start, pgoff_t end)
+{
+       struct inaccessible_notifier *notifier;
+
+       mutex_lock(&data->lock);
+       list_for_each_entry(notifier, &data->notifiers, list) {
+               notifier->ops->invalidate(notifier, start, end);
+       }
+       mutex_unlock(&data->lock);
+}
+
+static int inaccessible_release(struct inode *inode, struct file *file)
+{
+       struct inaccessible_data *data = inode->i_mapping->private_data;
+
+       fput(data->memfd);
+       kfree(data);
+       return 0;
+}
+
+static long inaccessible_fallocate(struct file *file, int mode,
+                                  loff_t offset, loff_t len)
+{
+       struct inaccessible_data *data = file->f_mapping->private_data;
+       struct file *memfd = data->memfd;
+       int ret;
+
+       if (mode & FALLOC_FL_PUNCH_HOLE) {
+               if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
+                       return -EINVAL;
+       }
+
+       ret = memfd->f_op->fallocate(memfd, mode, offset, len);
I think that shmem_file_operations.fallocate is only set if
CONFIG_TMPFS is enabled (shmem.c). Should there be a check at
initialization that fallocate is set, or maybe a config dependency, or
can we count on it always being enabled?
It is already there:

        config MEMFD_CREATE
                def_bool TMPFS || HUGETLBFS

And we reject inaccessible memfd_create() for HUGETLBFS.

But if we go with a separate syscall, yes, we need the dependency.
I missed that, thanks.
quoted
quoted
+       inaccessible_notifier_invalidate(data, offset, offset + len);
+       return ret;
+}
+
<...>
quoted
+void inaccessible_register_notifier(struct file *file,
+                                   struct inaccessible_notifier *notifier)
+{
+       struct inaccessible_data *data = file->f_mapping->private_data;
+
+       mutex_lock(&data->lock);
+       list_add(&notifier->list, &data->notifiers);
+       mutex_unlock(&data->lock);
+}
+EXPORT_SYMBOL_GPL(inaccessible_register_notifier);
If the memfd wasn't marked as inaccessible, or more generally
speaking, if the file isn't a memfd_inaccessible file, this ends up
accessing an uninitialized pointer for the notifier list. Should there
be a check for that here, and have this function return an error if
that's not the case?
I think it is "don't do that" category. inaccessible_register_notifier()
caller has to know what file it operates on, no?
The thing is, you could oops the kernel from userspace. For that, all
you have to do is a memfd_create without the MFD_INACCESSIBLE,
followed by a KVM_SET_USER_MEMORY_REGION using that as the private_fd.
I ran into this using my port of this patch series to arm64.

Cheers,
/fuad

--
  Kiryl Shutsemau / Kirill A. Shutemov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help