Re: Linux 5.18-rc4
From: John Johansen <john.johansen@canonical.com>
Date: 2022-06-06 19:20:01
Also in:
linux-fsdevel, linux-mm, lkml
Possibly related (same subject, not in this thread)
- 2022-04-27 · Re: Linux 5.18-rc4 · Linus Torvalds <torvalds@linux-foundation.org>
On 6/6/22 11:28, Linus Torvalds wrote:
On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman [off-list ref] wrote:quoted
Has anyone looked into this lock ordering issues?The deadlock isquoted
quoted
quoted
[78140.503821] CPU0 CPU1 [78140.503823] ---- ---- [78140.503824] lock(&newf->file_lock); [78140.503826] lock(&p->alloc_lock); [78140.503828] lock(&newf->file_lock); [78140.503830] lock(&ctx->lock);and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show() in fs/proc/fd.c: task_lock(task); files = task->files; if (files) { unsigned int fd = proc_fd(m->private); spin_lock(&files->file_lock); and that looks all normal. But the other chains look painful. I do see the IPC code doing ugly things, in particular I detest this code: task_lock(current); list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist); task_unlock(current); where it is using the task lock to protect the shm_clist list. Nasty. And it's doing that inside the shm_ids.rwsem lock _and_ inside the shp->shm_perm.lock. So the IPC code has newseg() doing shmget -> ipcget(): down_write(ids->rwsem) -> newseg(): ipc_addid gets perm->lock task_lock(current) so you have ids->rwsem -> perm->lock -> alloc_lock there. So now we have that ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock when you put those sequences together. But I didn't figure out what the security subsystem angle is and how that then apparently mixes things up with execve. Yes, newseg() is doing that error = security_shm_alloc(&shp->shm_perm); while holding rwsem, but I can't see how that matters. From the lockdep output, rwsem doesn't actually seem to be part of the whole sequence. It *looks* like we have apparmour ctx->lock --> radix_tree_preloads.lock --> ipcperm->lock and apparently that's called under the file_lock somewhere, completing the circle. I guess the execve component is that begin_new_exec -> security_bprm_committing_creds -> apparmor_bprm_committing_creds -> aa_inherit_files -> iterate_fd -> *takes file_lock* match_file -> aa_file_perm -> update_file_ctx *takes ctx->lock* so that's how you get file_lock -> ctx->lock.
yes
So you have:
SHMGET:
ipcperm->lock -> alloc_lock
/proc:
alloc_lock -> file_lock
apparmor_bprm_committing_creds:
file_lock -> ctx->lock
and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.yeah that is the part I got stuck on, before being pulled away from this
I suspect that part is that both Apparmor and IPC use the idr local lock.
bingo, apparmor moved its secids allocation from a custom radix tree to idr in 99cc45e48678 apparmor: Use an IDR to allocate apparmor secids and ipc is using the idr for its id allocation as well I can easily lift the secid() allocation out of the ctx->lock but that would still leave it happening under the file_lock and not fix the problem. I think the quick solution would be for apparmor to stop using idr, reverting back at least temporarily to the custom radix tree.