Re: [PATCH v3 5/5] fanotify: add pidfd support to the fanotify API
From: Jann Horn <jannh@google.com>
Date: 2021-07-27 00:24:08
Also in:
linux-fsdevel
On Wed, Jul 21, 2021 at 8:21 AM Matthew Bobrowski [off-list ref] wrote:
Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which allows userspace applications to control whether a pidfd info record containing a pidfd is to be returned with each event. If FAN_REPORT_PIDFD is enabled for a notification group, an additional struct fanotify_event_info_pidfd object will be supplied alongside the generic struct fanotify_event_metadata within a single event. This functionality is analogous to that of FAN_REPORT_FID in terms of how the event structure is supplied to the userspace application. Usage of FAN_REPORT_PIDFD with FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a struct fanotify_event_info_pidfd object will follow any struct fanotify_event_info_fid object. Currently, the usage of FAN_REPORT_TID is not permitted along with FAN_REPORT_PIDFD as the pidfd API only supports the creation of pidfds for thread-group leaders. Additionally, the FAN_REPORT_PIDFD is limited to privileged processes only i.e. listeners that are running with the CAP_SYS_ADMIN capability. Attempting to supply either of these initialization flags with FAN_REPORT_PIDFD will result with EINVAL being returned to the caller. In the event of a pidfd creation error, there are two types of error values that can be reported back to the listener. There is FAN_NOPIDFD, which will be reported in cases where the process responsible for generating the event has terminated prior to fanotify being able to create pidfd for event->pid via pidfd_create(). The there is FAN_EPIDFD, which will be reported if a more generic pidfd creation error occurred when calling pidfd_create().
[...]
quoted hunk ↗ jump to hunk
@@ -524,6 +562,34 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, } metadata.fd = fd; + if (pidfd_mode) { + /* + * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual + * exclusion is ever lifted. At the time of incoporating pidfd + * support within fanotify, the pidfd API only supported the + * creation of pidfds for thread-group leaders. + */ + WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); + + /* + * The PIDTYPE_TGID check for an event->pid is performed + * preemptively in attempt to catch those rare instances where + * the process responsible for generating the event has + * terminated prior to calling into pidfd_create() and acquiring + * a valid pidfd. Report FAN_NOPIDFD to the listener in those + * cases. All other pidfd creation errors are represented as + * FAN_EPIDFD. + */ + if (metadata.pid == 0 || + !pid_has_task(event->pid, PIDTYPE_TGID)) { + pidfd = FAN_NOPIDFD; + } else { + pidfd = pidfd_create(event->pid, 0); + if (pidfd < 0) + pidfd = FAN_EPIDFD; + } + } +
As a general rule, f_op->read callbacks aren't allowed to mess with the file descriptor table of the calling process. A process should be able to receive a file descriptor from an untrusted source and call functions like read() on it without worrying about affecting its own file descriptor table state with that. I realize that existing fanotify code appears to be violating that rule already, and that you're limiting creation of fanotify file descriptors that can hit this codepath to CAP_SYS_ADMIN, but still, I think fanotify_read() probably ought to be an ioctl, or something along those lines, instead of an f_op->read handler if it messes with the caller's fd table?