Re: [RFC PATCH] lsm: add the inode_free_security_rcu() LSM implementation hook
From: Matus Jokay <hidden>
Date: 2024-07-23 09:27:29
Also in:
linux-fsdevel, selinux
On 22. 7. 2024 21:46, Paul Moore wrote:
On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay [off-list ref] wrote:quoted
On 10. 7. 2024 12:40, Mickaël Salaün wrote:quoted
On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:quoted
The LSM framework has an existing inode_free_security() hook which is used by LSMs that manage state associated with an inode, but due to the use of RCU to protect the inode, special care must be taken to ensure that the LSMs do not fully release the inode state until it is safe from a RCU perspective. This patch implements a new inode_free_security_rcu() implementation hook which is called when it is safe to free the LSM's internal inode state. Unfortunately, this new hook does not have access to the inode itself as it may already be released, so the existing inode_free_security() hook is retained for those LSMs which require access to the inode. Signed-off-by: Paul Moore <paul@paul-moore.com>I like this new hook. It is definitely safer than the current approach. To make it more consistent, I think we should also rename security_inode_free() to security_inode_put() to highlight the fact that LSM implementations should not free potential pointers in this blob because they could still be dereferenced in a path walk.quoted
--- include/linux/lsm_hook_defs.h | 1 + security/integrity/ima/ima.h | 2 +- security/integrity/ima/ima_iint.c | 20 ++++++++------------ security/integrity/ima/ima_main.c | 2 +- security/landlock/fs.c | 9 ++++++--- security/security.c | 26 +++++++++++++------------- 6 files changed, 30 insertions(+), 30 deletions(-)...quoted
Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: 1) How does this patch close [1]? As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, i.e. referencing the inode in a VFS path walk while destroying it... Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.The VFS folks can likely provide a better, or perhaps a more correct answer, but my understanding is that during the path walk the inode is protected by a RCU lock which allows for multiple threads to access the inode simultaneously; this could result in some cases where one thread is destroying the inode while another is accessing it. Changing this would require changes to the VFS code, and I'm not sure why you would want to change it anyway, the performance win of using RCU here is likely significant.quoted
2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?I'm not an RCU expert, but I don't believe there are any guarantees that the inode_free_by_rcu() and the inode's own free routines are going to be called within the same RCU grace period (not really applicable as inode_free_by_rcu() isn't called *during* a grace period, but *after* the grace period of the associated security_inode_free() call). However, this patch does not rely on synchronization between the inode and inode LSM free routine in inode_free_by_rcu(); the inode_free_by_rcu() function and the new inode_free_security_rcu() LSM callback does not have a pointer to the inode, only the inode's LSM blob. I agree that it is a bit odd, but freeing the inode and inode's LSM blob independently of each other should not cause a problem so long as the inode is no longer in use (hence the RCU callbacks).
Paul, many thanks for your answer.
I will try to clarify the issue, because fsnotify was a bad example.
Here is the related code taken from v10.
void security_inode_free(struct inode *inode)
{
call_void_hook(inode_free_security, inode);
/*
* The inode may still be referenced in a path walk and
* a call to security_inode_permission() can be made
* after inode_free_security() is called. Ideally, the VFS
* wouldn't do this, but fixing that is a much harder
* job. For now, simply free the i_security via RCU, and
* leave the current inode->i_security pointer intact.
* The inode will be freed after the RCU grace period too.
*/
if (inode->i_security)
call_rcu((struct rcu_head *)inode->i_security,
inode_free_by_rcu);
}
void __destroy_inode(struct inode *inode)
{
BUG_ON(inode_has_buffers(inode));
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
locks_free_lock_context(inode);
if (!inode->i_nlink) {
WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
atomic_long_dec(&inode->i_sb->s_remove_count);
}
#ifdef CONFIG_FS_POSIX_ACL
if (inode->i_acl && !is_uncached_acl(inode->i_acl))
posix_acl_release(inode->i_acl);
if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
posix_acl_release(inode->i_default_acl);
#endif
this_cpu_dec(nr_inodes);
}
static void destroy_inode(struct inode *inode)
{
const struct super_operations *ops = inode->i_sb->s_op;
BUG_ON(!list_empty(&inode->i_lru));
__destroy_inode(inode);
if (ops->destroy_inode) {
ops->destroy_inode(inode);
if (!ops->free_inode)
return;
}
inode->free_inode = ops->free_inode;
call_rcu(&inode->i_rcu, i_callback);
}
Yes, inode_free_by_rcu() is being called after the grace period of the associated
security_inode_free(). i_callback() is also called after the grace period, but is it
always the same grace period as in the case of inode_free_by_rcu()? If not in general,
maybe it could be a problem. Explanation below.
If there is a function call leading to the end of the grace period between
call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
or another mechanism?), there will be a small time window, when the inode security
context is released, but the inode itself not, because call_rcu(i_callback) was not called
yet. So in that case each access to inode security blob leads to UAF.
For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
*before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
access to the inode during path lookup may be performed. Note: I use destroy_inode() as
*an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
and RCU, too.
In the previous message I only mentioned fsnotify, but it was only as an example.
I think that destroy_inode() is a better example of the idea I wanted to express.
I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
unpleasant to get another UAF in a production environment.
And regarding the UAF in [1], it seems very strange to me. The object managed by
Landlock was *not* dereferenced. There was access to the inode security blob itself.
static void hook_inode_free_security(struct inode *const inode)
{
/*
* All inodes must already have been untied from their object by
* release_inode() or hook_sb_delete().
*/
WARN_ON_ONCE(landlock_inode(inode)->object);
}
But security blob is released at the end of the grace period related to
security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered
inode_free_security hooks.
The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus,
I think, there was another call of __destroy_inode(). Or general protection fault was
not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how*
happened.
If Landlock had dereferenced the object it manages, this RFC could be the right one (if
it were a dereference from a fast path lookup, of course).
[1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@google.com/ (local)
quoted
If not, can the security context be released earlier than the inode itself?Possibly, but it should only happen after the inode is no longer in use (the call_rcu () callback should ensure that we are past all of the currently executing RCU critical sections).quoted
If yes, can be executed inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?I do not believe so, see the discussion above, but I welcome any corrections.quoted
Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)If fsnotify is affecting this negatively then I suspect that is a reason for much larger concern as I believe that would indicate a problem with fsnotify and the inode locking scheme.