Thread (30 messages) 30 messages, 7 authors, 2024-11-27

Re: [PATCH 1/3] ima: Remove inode lock

From: Roberto Sassu <hidden>
Date: 2024-10-18 09:24:38
Also in: bpf, linux-fsdevel, linux-integrity, linux-mm, lkml

On Wed, 2024-10-16 at 11:59 +0200, Roberto Sassu wrote:
On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote:
quoted
On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote:
quoted
On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
quoted
On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
quoted
On Wed, Oct 9, 2024 at 11:36 AM Paul Moore [off-list ref] wrote:
quoted
On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
[off-list ref] wrote:
quoted
From: Roberto Sassu <roberto.sassu@huawei.com>

Move out the mutex in the ima_iint_cache structure to a new structure
called ima_iint_cache_lock, so that a lock can be taken regardless of
whether or not inode integrity metadata are stored in the inode.

Introduce ima_inode_security() to simplify accessing the new structure in
the inode security blob.

Move the mutex initialization and annotation in the new function
ima_inode_alloc_security() and introduce ima_iint_lock() and
ima_iint_unlock() to respectively lock and unlock the mutex.

Finally, expand the critical region in process_measurement() guarded by
iint->mutex up to where the inode was locked, use only one iint lock in
__ima_inode_hash(), since the mutex is now in the inode security blob, and
replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 security/integrity/ima/ima.h      | 26 ++++++++---
 security/integrity/ima/ima_api.c  |  4 +-
 security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
 security/integrity/ima/ima_main.c | 39 +++++++---------
 4 files changed, 104 insertions(+), 42 deletions(-)
I'm not an IMA expert, but it looks reasonable to me, although
shouldn't this carry a stable CC in the patch metadata?

Reviewed-by: Paul Moore <paul@paul-moore.com>
Sorry, one more thing ... did you verify this patchset resolves the
syzbot problem?  I saw at least one reproducer.
Uhm, could not reproduce the deadlock with the reproducer. However,
without the patch I have a lockdep warning, and with I don't.

I asked syzbot to try the patches. Let's see.
I actually got a different lockdep warning:

[  904.603365] ======================================================
[  904.604264] WARNING: possible circular locking dependency detected
[  904.605141] 6.12.0-rc2+ #20 Not tainted
[  904.605697] ------------------------------------------------------
I can reproduce by executing the syzbot reproducer and in another
terminal by logging in with SSH (not all the times).

If I understood what the lockdep warning means, this is the scenario.

A task accesses a seq_file which is in the IMA policy, so we enter the
critical region guarded by the iint lock. But before we get the chance
to measure the file, a second task calls remap_file_pages() on the same
seq_file.

remap_file_pages() takes the mmap lock and, if the event matches the
IMA policy too, the second task waits for the iint lock to be released.

Now, the first task starts to measure the seq_file and takes the
seq_file lock. I don't know if 3 processes must be involved, but I was
thinking that reading the seq_file from the first task can trigger a
page fault, which requires to take the mmap lock.

At this point, we reach a deadlock. The first task waits for the mmap
lock to be released, and the second waits for the iint lock to be
released, which both cannot happen.
+ mm, mm folks

(I leave the lockdep warning down for the new people included in the
thread).

I think I understood what the problem is. Any lock that is taken inside
security_file_mmap() would cause lock inversion since:
+ Kirill
 
Ok, to be clear, we are talking about a regression introduced by commit
ea7e2d5e49c05 ("mm: call the security_mmap_file() LSM hook in
remap_file_pages()").

Originally, Mimi asked this patch to be included:

https://lore.kernel.org/lkml/1336963631-3541-1-git-send-email-zohar@us.ibm.com/ (local)

This was due to the commit 196f518 ("IMA: explicit IMA i_flag to remove
global lock on inode_delete"), which added an inode lock to protect the
new flag S_IMA, used to track integrity metadata only for the set of
inodes evaluated by IMA.

The problem was that, for mmapped files, the lock is taken in the
opposite order than the convention (i_mutex and then mmap_sem), causing
a lock inversion and a lockdep warning.

Linus proposed to split security_file_mmap() into security_mmap_file()
and security_mmap_addr(), so that the former can be taken out of the
mmap_sem lock and remove the lock inversion. The final commit was made
by Al Viro, 8b3ec6814c83 ("take security_mmap_file() outside of -
mmap_sem").
Commit ea7e2d5e49c05 put again a security_mmap_file() call inside the
mmap_sem (now mmap_lock) lock, causing the recent syzbot reports.

Paul asked if the inode lock can be removed from IMA, and actually
partially can be done:

https://lore.kernel.org/linux-integrity/20241008165732.2603647-1-roberto.sassu@huaweicloud.com/ (local)

The patch is good in its own, since it merges two critical sections in
one. However, the inode lock cannot be removed completely:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/integrity/ima/ima_main.c?h=v6.12-rc3#n386

This one is required to set the security.ima xattr in
ima_appraise_measurement() -> ima_fix_xattr(), when IMA-Appraise is in
fix mode (ima_appraise=fix in the kernel command line).

I would say that my original suggestion of putting security_mmap_file()
back to the mmap_lock lock probably is not the optimal solution. Maybe
we could get around removing the remaining inode lock in IMA, but we
would also limit future LSMs to not use the inode lock if they need.

Probably it is hard, @Kirill would there be any way to safely move
security_mmap_file() out of the mmap_lock lock?

Thanks

Roberto


PS: I'm aware that Shu Han proposed a solution to the lock inversion:

https://lore.kernel.org/linux-security-module/20240928180044.50-1-ebpqwerty472123@gmail.com/ (local)

but I think anyway it boils down to either moving security_mmap_file()
to the mmap_lock lock again for all calls, or viceversa, moving
security_mmap_file() out of the remap_file_pages() system call.
vm_mmap_pgoff():

	ret = security_mmap_file(file, prot, flag);
	if (!ret) {
		if (mmap_write_lock_killable(mm))



SYSCALL_DEFINE5(remap_file_pages, ...):

	if (mmap_write_lock_killable(mm))
		return -EINTR;

[...]

	file = get_file(vma->vm_file);
	ret = security_mmap_file(vma->vm_file, prot, flags);
	if (ret)
		goto out_fput;


Since I didn't see a good way to change the order of the second, I
changed the order of the first, i.e. I put security_file_mmap() under
the mmap lock.

(I don't know if this is a good idea.)

I cannot reproduce the lockdep warning anymore.

@mm folks, what is your opinion?

Thanks

Roberto
quoted
Roberto
quoted
[  904.606577] systemd/1 is trying to acquire lock:
[  904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
[  904.608290] 
[  904.608290] but task is already holding lock:
[  904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
[  904.610429] 
[  904.610429] which lock already depends on the new lock.
[  904.610429] 
[  904.611574] 
[  904.611574] the existing dependency chain (in reverse order) is:
[  904.612628] 
[  904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
[  904.613681]        __mutex_lock+0xaf/0x760
[  904.614266]        mutex_lock_nested+0x27/0x40
[  904.614897]        ima_iint_lock+0x24/0x40
[  904.615490]        process_measurement+0x176/0xef0
[  904.616168]        ima_file_mmap+0x98/0x120
[  904.616767]        security_mmap_file+0x408/0x560
[  904.617444]        __do_sys_remap_file_pages+0x2fa/0x4c0
[  904.618194]        __x64_sys_remap_file_pages+0x29/0x40
[  904.618937]        x64_sys_call+0x6e8/0x4550
[  904.619546]        do_syscall_64+0x71/0x180
[  904.620155]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.620952] 
[  904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[  904.621813]        __might_fault+0x6f/0xb0
[  904.622400]        _copy_to_iter+0x12e/0xa80
[  904.623009]        seq_read_iter+0x593/0x6b0
[  904.623629]        proc_reg_read_iter+0x31/0xe0
[  904.624276]        vfs_read+0x256/0x3d0
[  904.624822]        ksys_read+0x6d/0x160
[  904.625372]        __x64_sys_read+0x1d/0x30
[  904.625964]        x64_sys_call+0x1068/0x4550
[  904.626594]        do_syscall_64+0x71/0x180
[  904.627188]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.627975] 
[  904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
[  904.628787]        __lock_acquire+0x17f3/0x2320
[  904.629432]        lock_acquire+0xf2/0x420
[  904.630013]        __mutex_lock+0xaf/0x760
[  904.630596]        mutex_lock_nested+0x27/0x40
[  904.631225]        seq_read_iter+0x62/0x6b0
[  904.631831]        kernfs_fop_read_iter+0x1ef/0x2c0
[  904.632599]        __kernel_read+0x113/0x350
[  904.633206]        integrity_kernel_read+0x23/0x40
[  904.633902]        ima_calc_file_hash_tfm+0x14e/0x230
[  904.634621]        ima_calc_file_hash+0x97/0x250
[  904.635281]        ima_collect_measurement+0x4be/0x530
[  904.636008]        process_measurement+0x7c0/0xef0
[  904.636689]        ima_file_check+0x65/0x80
[  904.637295]        security_file_post_open+0xb1/0x1b0
[  904.638008]        path_openat+0x216/0x1280
[  904.638605]        do_filp_open+0xab/0x140
[  904.639185]        do_sys_openat2+0xba/0x120
[  904.639805]        do_sys_open+0x4c/0x80
[  904.640366]        __x64_sys_openat+0x23/0x30
[  904.640992]        x64_sys_call+0x2575/0x4550
[  904.641616]        do_syscall_64+0x71/0x180
[  904.642207]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.643003] 
[  904.643003] other info that might help us debug this:
[  904.643003] 
[  904.644149] Chain exists of:
[  904.644149]   &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
[  904.644149] 
[  904.645763]  Possible unsafe locking scenario:
[  904.645763] 
[  904.646614]        CPU0                    CPU1
[  904.647264]        ----                    ----
[  904.647909]   lock(&ima_iint_lock_mutex_key[depth]);
[  904.648617]                                lock(&mm->mmap_lock);
[  904.649479]                                lock(&ima_iint_lock_mutex_key[depth]);
[  904.650543]   lock(&p->lock);
[  904.650974] 
[  904.650974]  *** DEADLOCK ***
[  904.650974] 
[  904.651826] 1 lock held by systemd/1:
[  904.652376]  #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
[  904.653759] 
[  904.653759] stack backtrace:
[  904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
[  904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  904.656497] Call Trace:
[  904.656856]  <TASK>
[  904.657166]  dump_stack_lvl+0x134/0x1a0
[  904.657728]  dump_stack+0x14/0x30
[  904.658206]  print_circular_bug+0x38d/0x450
[  904.658812]  check_noncircular+0xed/0x120
[  904.659396]  ? srso_return_thunk+0x5/0x5f
[  904.659972]  ? srso_return_thunk+0x5/0x5f
[  904.660569]  __lock_acquire+0x17f3/0x2320
[  904.661145]  lock_acquire+0xf2/0x420
[  904.661664]  ? seq_read_iter+0x62/0x6b0
[  904.662217]  ? srso_return_thunk+0x5/0x5f
[  904.662886]  __mutex_lock+0xaf/0x760
[  904.663408]  ? seq_read_iter+0x62/0x6b0
[  904.663961]  ? seq_read_iter+0x62/0x6b0
[  904.664525]  ? srso_return_thunk+0x5/0x5f
[  904.665098]  ? mark_lock+0x4e/0x750
[  904.665610]  ? mutex_lock_nested+0x27/0x40
[  904.666194]  ? find_held_lock+0x3a/0x100
[  904.666770]  mutex_lock_nested+0x27/0x40
[  904.667337]  seq_read_iter+0x62/0x6b0
[  904.667869]  kernfs_fop_read_iter+0x1ef/0x2c0
[  904.668536]  __kernel_read+0x113/0x350
[  904.669079]  integrity_kernel_read+0x23/0x40
[  904.669698]  ima_calc_file_hash_tfm+0x14e/0x230
[  904.670349]  ? __lock_acquire+0xc32/0x2320
[  904.670937]  ? srso_return_thunk+0x5/0x5f
[  904.671525]  ? __lock_acquire+0xfbb/0x2320
[  904.672113]  ? srso_return_thunk+0x5/0x5f
[  904.672693]  ? srso_return_thunk+0x5/0x5f
[  904.673280]  ? lock_acquire+0xf2/0x420
[  904.673818]  ? kernfs_iop_getattr+0x4a/0xb0
[  904.674424]  ? srso_return_thunk+0x5/0x5f
[  904.674997]  ? find_held_lock+0x3a/0x100
[  904.675564]  ? srso_return_thunk+0x5/0x5f
[  904.676156]  ? srso_return_thunk+0x5/0x5f
[  904.676740]  ? srso_return_thunk+0x5/0x5f
[  904.677322]  ? local_clock_noinstr+0x9/0xb0
[  904.677923]  ? srso_return_thunk+0x5/0x5f
[  904.678502]  ? srso_return_thunk+0x5/0x5f
[  904.679075]  ? lock_release+0x4e2/0x570
[  904.679639]  ima_calc_file_hash+0x97/0x250
[  904.680227]  ima_collect_measurement+0x4be/0x530
[  904.680901]  ? srso_return_thunk+0x5/0x5f
[  904.681496]  ? srso_return_thunk+0x5/0x5f
[  904.682070]  ? __kernfs_iattrs+0x4a/0x140
[  904.682658]  ? srso_return_thunk+0x5/0x5f
[  904.683242]  ? process_measurement+0x7c0/0xef0
[  904.683876]  ? srso_return_thunk+0x5/0x5f
[  904.684462]  process_measurement+0x7c0/0xef0
[  904.685078]  ? srso_return_thunk+0x5/0x5f
[  904.685654]  ? srso_return_thunk+0x5/0x5f
[  904.686228]  ? _raw_spin_unlock_irqrestore+0x5d/0xd0
[  904.686938]  ? srso_return_thunk+0x5/0x5f
[  904.687523]  ? srso_return_thunk+0x5/0x5f
[  904.688098]  ? srso_return_thunk+0x5/0x5f
[  904.688672]  ? local_clock_noinstr+0x9/0xb0
[  904.689273]  ? srso_return_thunk+0x5/0x5f
[  904.689846]  ? srso_return_thunk+0x5/0x5f
[  904.690430]  ? srso_return_thunk+0x5/0x5f
[  904.691005]  ? srso_return_thunk+0x5/0x5f
[  904.691583]  ? srso_return_thunk+0x5/0x5f
[  904.692180]  ? local_clock_noinstr+0x9/0xb0
[  904.692841]  ? srso_return_thunk+0x5/0x5f
[  904.693419]  ? srso_return_thunk+0x5/0x5f
[  904.693990]  ? lock_release+0x4e2/0x570
[  904.694544]  ? srso_return_thunk+0x5/0x5f
[  904.695115]  ? kernfs_put_active+0x5d/0xc0
[  904.695708]  ? srso_return_thunk+0x5/0x5f
[  904.696286]  ? kernfs_fop_open+0x376/0x6b0
[  904.696872]  ima_file_check+0x65/0x80
[  904.697409]  security_file_post_open+0xb1/0x1b0
[  904.698058]  path_openat+0x216/0x1280
[  904.698589]  do_filp_open+0xab/0x140
[  904.699106]  ? srso_return_thunk+0x5/0x5f
[  904.699693]  ? lock_release+0x554/0x570
[  904.700264]  ? srso_return_thunk+0x5/0x5f
[  904.700836]  ? do_raw_spin_unlock+0x76/0x140
[  904.701450]  ? srso_return_thunk+0x5/0x5f
[  904.702021]  ? _raw_spin_unlock+0x3f/0xa0
[  904.702606]  ? srso_return_thunk+0x5/0x5f
[  904.703178]  ? alloc_fd+0x1ca/0x3b0
[  904.703685]  do_sys_openat2+0xba/0x120
[  904.704223]  ? file_free+0x8d/0x110
[  904.704729]  do_sys_open+0x4c/0x80
[  904.705221]  __x64_sys_openat+0x23/0x30
[  904.705784]  x64_sys_call+0x2575/0x4550
[  904.706337]  do_syscall_64+0x71/0x180
[  904.706863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.707587] RIP: 0033:0x7f3462123037
[  904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
[  904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[  904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
[  904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
[  904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
[  904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
[  904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
[  904.716877]  </TASK>

Roberto
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help