Thread (25 messages) 25 messages, 7 authors, 2024-07-25

Re: [RFC PATCH] lsm: add the inode_free_security_rcu() LSM implementation hook

From: Matus Jokay <hidden>
Date: 2024-07-23 09:27:29
Also in: linux-fsdevel, selinux

On 22. 7. 2024 21:46, Paul Moore wrote:
On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay [off-list ref] wrote:
quoted
On 10. 7. 2024 12:40, Mickaël Salaün wrote:
quoted
On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
quoted
The LSM framework has an existing inode_free_security() hook which
is used by LSMs that manage state associated with an inode, but
due to the use of RCU to protect the inode, special care must be
taken to ensure that the LSMs do not fully release the inode state
until it is safe from a RCU perspective.

This patch implements a new inode_free_security_rcu() implementation
hook which is called when it is safe to free the LSM's internal inode
state.  Unfortunately, this new hook does not have access to the inode
itself as it may already be released, so the existing
inode_free_security() hook is retained for those LSMs which require
access to the inode.

Signed-off-by: Paul Moore <paul@paul-moore.com>
I like this new hook.  It is definitely safer than the current approach.

To make it more consistent, I think we should also rename
security_inode_free() to security_inode_put() to highlight the fact that
LSM implementations should not free potential pointers in this blob
because they could still be dereferenced in a path walk.
quoted
---
 include/linux/lsm_hook_defs.h     |  1 +
 security/integrity/ima/ima.h      |  2 +-
 security/integrity/ima/ima_iint.c | 20 ++++++++------------
 security/integrity/ima/ima_main.c |  2 +-
 security/landlock/fs.c            |  9 ++++++---
 security/security.c               | 26 +++++++++++++-------------
 6 files changed, 30 insertions(+), 30 deletions(-)
...
quoted
Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:

1) How does this patch close [1]?
   As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
   Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
   i.e. referencing the inode in a VFS path walk while destroying it...
   Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
The VFS folks can likely provide a better, or perhaps a more correct
answer, but my understanding is that during the path walk the inode is
protected by a RCU lock which allows for multiple threads to access
the inode simultaneously; this could result in some cases where one
thread is destroying the inode while another is accessing it.
Changing this would require changes to the VFS code, and I'm not sure
why you would want to change it anyway, the performance win of using
RCU here is likely significant.
quoted
2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
I'm not an RCU expert, but I don't believe there are any guarantees
that the inode_free_by_rcu() and the inode's own free routines are
going to be called within the same RCU grace period (not really
applicable as inode_free_by_rcu() isn't called *during* a grace
period, but *after* the grace period of the associated
security_inode_free() call).  However, this patch does not rely on
synchronization between the inode and inode LSM free routine in
inode_free_by_rcu(); the inode_free_by_rcu() function and the new
inode_free_security_rcu() LSM callback does not have a pointer to the
inode, only the inode's LSM blob.  I agree that it is a bit odd, but
freeing the inode and inode's LSM blob independently of each other
should not cause a problem so long as the inode is no longer in use
(hence the RCU callbacks).
Paul, many thanks for your answer.

I will try to clarify the issue, because fsnotify was a bad example.
Here is the related code taken from v10.

void security_inode_free(struct inode *inode)
{
	call_void_hook(inode_free_security, inode);
	/*
	 * The inode may still be referenced in a path walk and
	 * a call to security_inode_permission() can be made
	 * after inode_free_security() is called. Ideally, the VFS
	 * wouldn't do this, but fixing that is a much harder
	 * job. For now, simply free the i_security via RCU, and
	 * leave the current inode->i_security pointer intact.
	 * The inode will be freed after the RCU grace period too.
	 */
	if (inode->i_security)
		call_rcu((struct rcu_head *)inode->i_security,
			 inode_free_by_rcu);
}

void __destroy_inode(struct inode *inode)
{
	BUG_ON(inode_has_buffers(inode));
	inode_detach_wb(inode);
	security_inode_free(inode);
	fsnotify_inode_delete(inode);
	locks_free_lock_context(inode);
	if (!inode->i_nlink) {
		WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
		atomic_long_dec(&inode->i_sb->s_remove_count);
	}

#ifdef CONFIG_FS_POSIX_ACL
	if (inode->i_acl && !is_uncached_acl(inode->i_acl))
		posix_acl_release(inode->i_acl);
	if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
		posix_acl_release(inode->i_default_acl);
#endif
	this_cpu_dec(nr_inodes);
}

static void destroy_inode(struct inode *inode)
{
	const struct super_operations *ops = inode->i_sb->s_op;

	BUG_ON(!list_empty(&inode->i_lru));
	__destroy_inode(inode);
	if (ops->destroy_inode) {
		ops->destroy_inode(inode);
		if (!ops->free_inode)
			return;
	}
	inode->free_inode = ops->free_inode;
	call_rcu(&inode->i_rcu, i_callback);
}

Yes, inode_free_by_rcu() is being called after the grace period of the associated
security_inode_free(). i_callback() is also called after the grace period, but is it
always the same grace period as in the case of inode_free_by_rcu()? If not in general,
maybe it could be a problem. Explanation below.

If there is a function call leading to the end of the grace period between
call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
or another mechanism?), there will be a small time window, when the inode security
context is released, but the inode itself not, because call_rcu(i_callback) was not called
yet. So in that case each access to inode security blob leads to UAF.

For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
*before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
access to the inode during path lookup may be performed. Note: I use destroy_inode() as
*an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
and RCU, too.

In the previous message I only mentioned fsnotify, but it was only as an example.
I think that destroy_inode() is a better example of the idea I wanted to express.

I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
unpleasant to get another UAF in a production environment.

And regarding the UAF in [1], it seems very strange to me. The object managed by
Landlock was *not* dereferenced. There was access to the inode security blob itself.

static void hook_inode_free_security(struct inode *const inode)
{
	/*
	 * All inodes must already have been untied from their object by
	 * release_inode() or hook_sb_delete().
	 */
	WARN_ON_ONCE(landlock_inode(inode)->object);
}

But security blob is released at the end of the grace period related to
security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered
inode_free_security hooks.

The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus,
I think, there was another call of __destroy_inode(). Or general protection fault was
not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how*
happened.

If Landlock had dereferenced the object it manages, this RFC could be the right one (if
it were a dereference from a fast path lookup, of course).

[1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@google.com/ (local)

quoted
   If not, can the security context be released earlier than the inode itself?
Possibly, but it should only happen after the inode is no longer in
use (the call_rcu () callback should ensure that we are past all of
the currently executing RCU critical sections).
quoted
If yes, can be executed
   inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?
I do not believe so, see the discussion above, but I welcome any corrections.
quoted
   Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)
If fsnotify is affecting this negatively then I suspect that is a
reason for much larger concern as I believe that would indicate a
problem with fsnotify and the inode locking scheme.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help