Thread (39 messages) 39 messages, 7 authors, 2018-06-18

Re: [PATCH] bdi: Fix another oops in wb_workfn()

From: Jan Kara <jack@suse.cz>
Date: 2018-06-11 16:29:24
Also in: linux-fsdevel, lkml

Possibly related (same subject, not in this thread)

On Mon 11-06-18 09:01:31, Tejun Heo wrote:
Hello,

On Mon, Jun 11, 2018 at 11:12:48AM +0200, Jan Kara wrote:
quoted
However this is wrong and so is the patch. The problem is in
cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's
reference to wb structures before going through the list of wbs again and
calling wb_shutdown() on each of them. The writeback structures we are
accessing at this point can be already freed in principle like:

CPU1				CPU2
				cgwb_bdi_unregister()
				  cgwb_kill(*slot);

cgwb_release()
  queue_work(cgwb_release_wq, &wb->release_work);
cgwb_release_workfn()
				  wb = list_first_entry(&bdi->wb_list, ...)
				  spin_unlock_irq(&cgwb_lock);
  wb_shutdown(wb);
  ...				
  kfree_rcu(wb, rcu);
				  wb_shutdown(wb); -> oops use-after-free

I'm not 100% sure how to fix this. wb structures can be at various phases of
shutdown (or there may be other external references still existing) when we
enter cgwb_bdi_unregister() so I think adding a way for cgwb_bdi_unregister()
to wait for standard wb shutdown path to finish is the most robust way.
What do you think about attached patch Tejun? So far only compile tested...

Possible problem with it is that now cgwb_bdi_unregister() will wait for
all wb references to be dropped so it adds some implicit dependencies to
bdi shutdown path. 
Would something like the following work or am I missing the point
entirely?
I was pondering the same solution for a while but I think it won't work.
The problem is that e.g. wb_memcg_offline() could have already removed
wb from the radix tree but it is still pending in bdi->wb_list
(wb_shutdown() has not run yet) and so we'd drop reference we didn't get.

								Honza
quoted hunk ↗ jump to hunk
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 347cc83..359cacd 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -715,14 +715,19 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi)
 	WARN_ON(test_bit(WB_registered, &bdi->wb.state));
 
 	spin_lock_irq(&cgwb_lock);
-	radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0)
-		cgwb_kill(*slot);
+	radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) {
+		struct bdi_writeback *wb = *slot;
+
+		wb_get(wb);
+		cgwb_kill(wb);
+	}
 
 	while (!list_empty(&bdi->wb_list)) {
 		wb = list_first_entry(&bdi->wb_list, struct bdi_writeback,
 				      bdi_node);
 		spin_unlock_irq(&cgwb_lock);
 		wb_shutdown(wb);
+		wb_put(wb);
 		spin_lock_irq(&cgwb_lock);
 	}
 	spin_unlock_irq(&cgwb_lock);

-- 
tejun
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help