Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory
From: Chao Peng <hidden>
Date: 2022-12-02 06:53:47
Also in:
kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
On Thu, Dec 01, 2022 at 06:16:46PM -0800, Vishal Annapurve wrote:
On Tue, Oct 25, 2022 at 8:18 AM Chao Peng [off-list ref] wrote:quoted
...
quoted
+} + +SYSCALL_DEFINE1(memfd_restricted, unsigned int, flags) +{Looking at the underlying shmem implementation, there seems to be no way to enable transparent huge pages specifically for restricted memfd files. Michael discussed earlier about tweaking /sys/kernel/mm/transparent_hugepage/shmem_enabled setting to allow hugepages to be used while backing restricted memfd. Such a change will affect the rest of the shmem usecases as well. Even setting the shmem_enabled policy to "advise" wouldn't help unless file based advise for hugepage allocation is implemented.
Had a look at fadvise() and looks it does not support HUGEPAGE for any filesystem yet.
Does it make sense to provide a flag here to allow creating restricted memfds backed possibly by huge pages to give a more granular control?
We do have a unused 'flags' can be extended for such usage, but I would let Kirill have further look, perhaps need more discussions. Chao
quoted
+ struct file *file, *restricted_file; + int fd, err; + + if (flags) + return -EINVAL; + + fd = get_unused_fd_flags(0); + if (fd < 0) + return fd; + + file = shmem_file_setup("memfd:restrictedmem", 0, VM_NORESERVE); + if (IS_ERR(file)) { + err = PTR_ERR(file); + goto err_fd; + } + file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; + file->f_flags |= O_LARGEFILE; + + restricted_file = restrictedmem_file_create(file); + if (IS_ERR(restricted_file)) { + err = PTR_ERR(restricted_file); + fput(file); + goto err_fd; + } + + fd_install(fd, restricted_file); + return fd; +err_fd: + put_unused_fd(fd); + return err; +} + +void restrictedmem_register_notifier(struct file *file, + struct restrictedmem_notifier *notifier) +{ + struct restrictedmem_data *data = file->f_mapping->private_data; + + mutex_lock(&data->lock); + list_add(¬ifier->list, &data->notifiers); + mutex_unlock(&data->lock); +} +EXPORT_SYMBOL_GPL(restrictedmem_register_notifier); + +void restrictedmem_unregister_notifier(struct file *file, + struct restrictedmem_notifier *notifier) +{ + struct restrictedmem_data *data = file->f_mapping->private_data; + + mutex_lock(&data->lock); + list_del(¬ifier->list); + mutex_unlock(&data->lock); +} +EXPORT_SYMBOL_GPL(restrictedmem_unregister_notifier); + +int restrictedmem_get_page(struct file *file, pgoff_t offset, + struct page **pagep, int *order) +{ + struct restrictedmem_data *data = file->f_mapping->private_data; + struct file *memfd = data->memfd; + struct page *page; + int ret; + + ret = shmem_getpage(file_inode(memfd), offset, &page, SGP_WRITE); + if (ret) + return ret; + + *pagep = page; + if (order) + *order = thp_order(compound_head(page)); + + SetPageUptodate(page); + unlock_page(page); + + return 0; +} +EXPORT_SYMBOL_GPL(restrictedmem_get_page); -- 2.25.1