Re: [GIT PULL] selinux/selinux-pr-20250323
From: Paul Moore <paul@paul-moore.com>
Date: 2025-03-28 15:06:52
Also in:
lkml, selinux
On Fri, Mar 28, 2025 at 9:23 AM Stephen Smalley [off-list ref] wrote:
On Thu, Mar 27, 2025 at 3:40 PM Linus Torvalds [off-list ref] wrote:quoted
On Thu, 27 Mar 2025 at 12:16, Stephen Smalley [off-list ref] wrote:quoted
Where could/would we cache that information so that it was accessible directly by the VFS layer?So the VFS layer already does this for various other things. For this case, the natural thing to do would be to add another IOP_xyzzy flag in inode->i_opflags. That's how we already say things like "this inode has no filesystem-specific i_op->permission function" (IOP_FASTPERM), so that we don't even have to follow the "inode->i_op->permission" pointer chain to see a NULL pointer. Yes, the VFS layer is *heavily* optimized like that. It literally does that IOP_FASTPERM to avoid chasing two pointers - not even the call, just the "don't even bother to follow pointers to see if it's NULL". See do_inode_permission(). And we have 16 bits in that inode->i_opflags, and currently only use 7 of those bits. Adding one bit for a IOP_NO_SECURITY_LOOKUP kind of logic (feel free to rename that - just throwing a random name out as a suggestion) would be a complete no-brainer. NOTE! The rule for the i_opflags accesses is that *reading* them is done with no locking at all, but changing them takes the inode spinlock (and we should technically probably use WRITE_ONCE() and READ_ONCE(), but we don't). And notice that the "no locking at all for reading" means that if you *change* the bit - even though that involves locking - there may be concurrent lookups in process that won't see the change, and would go on as if the lookup still does not need any security layer call. No serialization to readers at all (although you could wait for an RCU period after changing if you really need to, and only use the bit in the RCU lookup). That should be perfectly fine - I really don't think serialization is even needed. If somebody is changing the policy rules, any file lookups *concurrent* to that change might not see the new rules, but that's the same as if it happened before the change. I just wanted to point out that the serialization is unbalanced: the spinlock for changing the flag is literally just to make sure that two bits being changed at the same time don't stomp on each other (because it's a 16-bit non-atomic field, and we didn't want to use a "unsigned long" and atomic bitops because the cache layout of the inode is also a big issue). And you can take the IOP_FASTPERM thing as an example of how to do this: it is left clear initially, and what happens is that during the permission lookup, if it *isn't* set, we'll follow those inode->i_io->permission pointers, and notice that we should set it: if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) { if (likely(inode->i_op->permission)) return inode->i_op->permission(idmap, inode, mask); /* This gets set once for the inode lifetime */ spin_lock(&inode->i_lock); inode->i_opflags |= IOP_FASTPERM; spin_unlock(&inode->i_lock); } and I think the security layer could take a very similar approach: not setting that IOP_NO_SECURITY_LOOKUP initially, but *when* a security_inode_permission() call is made with just MAY_NOT_BLOCK | MAY_LOOKUP, and the security layer notices that "this inode has no reason to care", it could set the bit so that *next* time around the VFS layer won't bother to call into security_inode_permission() unnecessarily. Does that clarify?Yes, thank you. I think it would be easy enough to make that change to selinux_inode_permission() and to clear that inode flag on file relabels (e.g. in selinux_inode_post_setxattr() and inode_invalidate_secctx()). Not as sure about handling policy reloads / boolean changes at runtime without also caching the policy sequence number in the inode too so that can be compared. Further, I'm unclear on how to implement this in a manner that works with the LSM stacking support, since some other module might disagree about whether we can skip these lookups. Normally I'd just add an extra boolean or flag argument to the underlying ->inode_permission() hook so each module can indicate its decision and security/security.c:security_inode_permission() could compose the result, but I guess that would require switching security_inode_permission() from using call_int_hook() to manually doing the lsm_for_each_hook() loop itself which may have an adverse impact on those calls in general.
I'm not sure the VFS flag approach is going to end up being practical due to various constraints, but I do have some other ideas that should allow us to drop the bulk of the SELinux AVC calls while doing path walks that I'm going to try today/soon. I'll obviously report back if it works out. -- paul-moore.com