Re: [PATCH v9 3/8] writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
From: Ming Lei <hidden>
Date: 2021-06-10 06:57:25
Also in:
linux-fsdevel, linux-mm, lkml
On Wed, Jun 09, 2021 at 05:21:14PM -0700, Roman Gushchin wrote:
quoted hunk ↗ jump to hunk
On Wed, Jun 09, 2021 at 11:32:44AM +0800, Ming Lei wrote:quoted
On Tue, Jun 08, 2021 at 04:02:20PM -0700, Roman Gushchin wrote:quoted
isw_nr_in_flight is used do determine whether the inode switch queue should be flushed from the umount path. Currently it's increased after grabbing an inode and even scheduling the switch work. It means the umount path can be walked past cleanup_offline_cgwb() with active inode references, which can result in a "Busy inodes after unmount." message and use-after-free issues (with inode->i_sb which gets freed). Fix it by incrementing isw_nr_in_flight before doing anything with the inode and decrementing in the case when switching wasn't scheduled. The problem hasn't yet been seen in the real life and was discovered by Jan Kara by looking into the code. Suggested-by: Jan Kara <jack-IBi9RG/b67k@public.gmane.org> Signed-off-by: Roman Gushchin <redacted> Reviewed-by: Jan Kara <redacted> --- fs/fs-writeback.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index b6fc13a4962d..4413e005c28c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c@@ -505,6 +505,8 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) if (!isw) return; + atomic_inc(&isw_nr_in_flight);smp_mb() may be required for ordering the WRITE in 'atomic_inc(&isw_nr_in_flight)' and the following READ on 'inode->i_sb->s_flags & SB_ACTIVE'. Otherwise, cgroup_writeback_umount() may observe zero of 'isw_nr_in_flight' because of re-order of the two OPs, then miss the flush_workqueue(). Also this barrier should serve as pair of the one added in cgroup_writeback_umount(), so maybe this patch should be merged with 2/8.Hi Ming! Good point, I agree. How about a patch below? Thanks! -- From 282861286074c47907759d80c01419f0d0630dae Mon Sep 17 00:00:00 2001 From: Roman Gushchin <redacted> Date: Wed, 9 Jun 2021 14:14:26 -0700 Subject: [PATCH] cgroup, writeback: add smp_mb() to inode_prepare_wbs_switch() Add a memory barrier between incrementing isw_nr_in_flight and checking the sb's SB_ACTIVE flag and grabbing an inode in inode_prepare_wbs_switch(). It's required to prevent grabbing an inode before incrementing isw_nr_in_flight, otherwise 0 can be obtained as isw_nr_in_flight in cgroup_writeback_umount() and isw_wq will not be flushed, potentially leading to a memory corruption. Added smp_mb() will work in pair with smp_mb() in cgroup_writeback_umount(). Suggested-by: Ming Lei <redacted> Signed-off-by: Roman Gushchin <redacted> --- fs/fs-writeback.c | 8 ++++++++ 1 file changed, 8 insertions(+)diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 545fce68e919..6332b86ca4ed 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c@@ -513,6 +513,14 @@ static void inode_switch_wbs_work_fn(struct work_struct *work) static bool inode_prepare_wbs_switch(struct inode *inode, struct bdi_writeback *new_wb) { + /* + * Paired with smp_mb() in cgroup_writeback_umount(). + * isw_nr_in_flight must be increased before checking SB_ACTIVE and + * grabbing an inode, otherwise isw_nr_in_flight can be observed as 0 + * in cgroup_writeback_umount() and the isw_wq will be not flushed. + */ + smp_mb(); + /* while holding I_WB_SWITCH, no one else can update the association */ spin_lock(&inode->i_lock); if (!(inode->i_sb->s_flags & SB_ACTIVE) ||
Looks fine, you may have to merge this one with 2/8 & 3/8, so the memory barrier use can be correct & intact for avoiding the race between switching cgwb and generic_shutdown_super(). Thanks, Ming