Thread (17 messages) 17 messages, 4 authors, 2021-06-10

Re: [PATCH v9 3/8] writeback, cgroup: increment isw_nr_in_flight before grabbing an inode

From: Ming Lei <hidden>
Date: 2021-06-10 06:57:25
Also in: linux-fsdevel, linux-mm, lkml

On Wed, Jun 09, 2021 at 05:21:14PM -0700, Roman Gushchin wrote:
quoted hunk ↗ jump to hunk
On Wed, Jun 09, 2021 at 11:32:44AM +0800, Ming Lei wrote:
quoted
On Tue, Jun 08, 2021 at 04:02:20PM -0700, Roman Gushchin wrote:
quoted
isw_nr_in_flight is used do determine whether the inode switch queue
should be flushed from the umount path. Currently it's increased
after grabbing an inode and even scheduling the switch work. It means
the umount path can be walked past cleanup_offline_cgwb() with active
inode references, which can result in a "Busy inodes after unmount."
message and use-after-free issues (with inode->i_sb which gets freed).

Fix it by incrementing isw_nr_in_flight before doing anything with
the inode and decrementing in the case when switching wasn't scheduled.

The problem hasn't yet been seen in the real life and was discovered
by Jan Kara by looking into the code.

Suggested-by: Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
Signed-off-by: Roman Gushchin <redacted>
Reviewed-by: Jan Kara <redacted>
---
 fs/fs-writeback.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b6fc13a4962d..4413e005c28c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -505,6 +505,8 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	if (!isw)
 		return;
 
+	atomic_inc(&isw_nr_in_flight);
smp_mb() may be required for ordering the WRITE in 'atomic_inc(&isw_nr_in_flight)'
and the following READ on 'inode->i_sb->s_flags & SB_ACTIVE'. Otherwise,
cgroup_writeback_umount() may observe zero of 'isw_nr_in_flight' because of
re-order of the two OPs, then miss the flush_workqueue().

Also this barrier should serve as pair of the one added in cgroup_writeback_umount(),
so maybe this patch should be merged with 2/8.
Hi Ming!

Good point, I agree. How about a patch below?

Thanks!

--

From 282861286074c47907759d80c01419f0d0630dae Mon Sep 17 00:00:00 2001
From: Roman Gushchin <redacted>
Date: Wed, 9 Jun 2021 14:14:26 -0700
Subject: [PATCH] cgroup, writeback: add smp_mb() to inode_prepare_wbs_switch()

Add a memory barrier between incrementing isw_nr_in_flight
and checking the sb's SB_ACTIVE flag and grabbing an inode in
inode_prepare_wbs_switch(). It's required to prevent grabbing
an inode before incrementing isw_nr_in_flight, otherwise
0 can be obtained as isw_nr_in_flight in cgroup_writeback_umount()
and isw_wq will not be flushed, potentially leading to a memory
corruption.

Added smp_mb() will work in pair with smp_mb() in
cgroup_writeback_umount().

Suggested-by: Ming Lei <redacted>
Signed-off-by: Roman Gushchin <redacted>
---
 fs/fs-writeback.c | 8 ++++++++
 1 file changed, 8 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 545fce68e919..6332b86ca4ed 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -513,6 +513,14 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 static bool inode_prepare_wbs_switch(struct inode *inode,
 				     struct bdi_writeback *new_wb)
 {
+	/*
+	 * Paired with smp_mb() in cgroup_writeback_umount().
+	 * isw_nr_in_flight must be increased before checking SB_ACTIVE and
+	 * grabbing an inode, otherwise isw_nr_in_flight can be observed as 0
+	 * in cgroup_writeback_umount() and the isw_wq will be not flushed.
+	 */
+	smp_mb();
+
 	/* while holding I_WB_SWITCH, no one else can update the association */
 	spin_lock(&inode->i_lock);
 	if (!(inode->i_sb->s_flags & SB_ACTIVE) ||
Looks fine, you may have to merge this one with 2/8 & 3/8, so the memory
barrier use can be correct & intact for avoiding the race between switching
cgwb and generic_shutdown_super().


Thanks,
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help