Thread (7 messages) 7 messages, 4 authors, 2021-12-13

Re: [PATCH] bfq: Fix use-after-free with cgroups

From: Jan Kara <jack@suse.cz>
Date: 2021-12-13 14:52:34
Also in: linux-block, stable

On Tue 07-12-21 20:08:43, Michal Koutný wrote:
On Wed, Dec 01, 2021 at 02:34:39PM +0100, Jan Kara [off-list ref] wrote:
quoted
After some analysis we've found out that the culprit of the problem is
that some task is reparented from cgroup G to the root cgroup and G is
offlined.
Just sharing my interpretation for context -- (I saw this was a system
using the unified cgroup hierarchy, io_cgrp_subsys_on_dfl_key was
enabled) and what was observed could also have been disabling the io
controller on given level -- that would also manifest similarly -- the
task is migrated to parent and the former blkcg is offlined.
Yes, that's another possibility.
quoted
+static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg)
[...]
-	bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
[...]
+	hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node)
+		bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
Here I assume root_group is (representing) the global blkcg root and
this reparenting thus skips all ancestors between the removed leaf and
the root. IIUC the associated io_context would then be treated as if it
was running in the root blkcg.
(Admittedly, this isn't a change from this patch but it may cause some
surprises if the given process runs after the operation.)
Yes, this is what happens in bfq_reparent_children() and basically
preserves what BFQ was already doing for a subset of bfq queues.
Reparenting to the immediate ancestors should be safe as cgroup core
should ensure children are offlined before parents. Would it make sense
to you?
I suppose yes, it makes more sense to reparent just to immediate parents
instead of the root of the blkcg hierarchy. Initially when developing the
patch I was not sure whether parent has to be still alive but as you write
it should be safe. I'll modify the patch to:

static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg)
{
        struct bfq_queue *bfqq;
        struct hlist_node *next;
        struct bfq_group *parent;

        parent = bfqg_parent(bfqg);
        if (!parent)
                parent = bfqd->root_group;

        hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node)
                bfq_bfqq_move(bfqd, bfqq, parent);
}

 
quoted
@@ -897,38 +844,17 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
[...]
-		 * It may happen that some queues are still active
-		 * (busy) upon group destruction (if the corresponding
-		 * processes have been forced to terminate). We move
-		 * all the leaf entities corresponding to these queues
-		 * to the root_group.
This comment is removed but it seems to me it assumed that the
reparented entities are only some transitional remainings of terminated
tasks but they may be the processes migrated upwards with a long (IO
active) life ahead.
Yes, this seemed to be a misconception of the old code...

								Honza

-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help