Thread (30 messages) 30 messages, 3 authors, 2021-02-04

Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG

From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-02-03 18:15:40

On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
quoted
Hi Brian,

On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
quoted
On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
quoted
As the first step of shrinking, this attempts to enable shrinking
unused space in the last allocation group by fixing up freespace
btree, agi, agf and adjusting super block and use a helper
xfs_ag_shrink_space() to fixup the last AG.

This can be all done in one transaction for now, so I think no
additional protection is needed.

Signed-off-by: Gao Xiang <redacted>
---
 fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
 fs/xfs/xfs_trans.c |  1 -
 2 files changed, 42 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6c4ab5e31054..4bcea22f7b3f 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
 	struct aghdr_init_data	*id,
 	xfs_agnumber_t		oagcount,
 	xfs_agnumber_t		nagcount,
-	xfs_rfsblock_t		*delta)
+	int64_t			*delta)
 {
 	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
 	int			error;
@@ -76,33 +76,41 @@ xfs_growfs_data_private(
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_div, nb_mod;
-	xfs_rfsblock_t		delta;
+	int64_t			delta;
 	xfs_agnumber_t		oagcount;
 	struct xfs_trans	*tp;
+	bool			extend;
 	struct aghdr_init_data	id = {};
 
 	nb = in->newblocks;
-	if (nb < mp->m_sb.sb_dblocks)
-		return -EINVAL;
-	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
+	if (nb == mp->m_sb.sb_dblocks)
+		return 0;
+
+	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
+	if (error)
 		return error;
-	error = xfs_buf_read_uncached(mp->m_ddev_targp,
+
+	if (nb > mp->m_sb.sb_dblocks) {
+		error = xfs_buf_read_uncached(mp->m_ddev_targp,
 				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
 				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
-	if (error)
-		return error;
-	xfs_buf_relse(bp);
+		if (error)
+			return error;
+		xfs_buf_relse(bp);
+	}
 
 	nb_div = nb;
 	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
 	nagcount = nb_div + (nb_mod != 0);
 	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
 		nagcount--;
-		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
-		if (nb < mp->m_sb.sb_dblocks)
+		if (nagcount < 2)
 			return -EINVAL;
What's the reason for the nagcount < 2 check? IIRC we warn about this
configuration at mkfs time, but allow it to proceed. Is it just that we
don't want to accidentally put the fs into an agcount == 1 state that
was originally formatted with >1 AGs?
Darrick once asked for avoiding shrinking the filesystem which has
only 1 AG.
It's worth mentioning why in a comment though:

	/*
	 * XFS doesn't really support single-AG filesystems, so do not
	 * permit callers to remove the filesystem's second and last AG.
	 */
	if (shrink && new_agcount < 2)
		return -EHAHANOYOUDONT;

But as Brian points out, we /do/ allow adding a second AG to a single-AG
fs.
quoted
quoted
What about the case where we attempt to grow an agcount == 1 fs but
don't enlarge enough to add the second AG? Does this change error
behavior in that case?
Yeah, thanks for catching this! If growfs allows 1 AG case before,
I think it needs to be refined. Let me update this in the next version!
quoted
quoted
+		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
 	}
+
 	delta = nb - mp->m_sb.sb_dblocks;
+	extend = (delta > 0);
 	oagcount = mp->m_sb.sb_agcount;
 
 	/* allocate the new per-ag structures */
@@ -110,22 +118,34 @@ xfs_growfs_data_private(
 		error = xfs_initialize_perag(mp, nagcount, &nagimax);
 		if (error)
 			return error;
+	} else if (nagcount < oagcount) {
+		/* TODO: shrinking the entire AGs hasn't yet completed */
+		return -EINVAL;
 	}
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
-			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
+			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
+			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
 
-	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
-	if (error)
-		goto out_trans_cancel;
-
+	if (extend) {
+		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
+						  nagcount, &delta);
+		if (error)
+			goto out_trans_cancel;
+	}
 	xfs_trans_agblocks_delta(tp, id.nfree);
It looks like id isn't used until the resize call above. Is this call
relevant for the shrink case?
I think it has nothing to do for the shrink the last AG case as well
(id.nfree == 0 here) but maybe use for the later shrinking the whole
AGs patchset. I can move into if (extend) in the next version.
quoted
quoted
 
-	/* If there are new blocks in the old last AG, extend it. */
+	/* If there are some blocks in the last AG, resize it. */
 	if (delta) {
This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
of the function. Should we ever get to this point with delta == 0? (If
not, maybe convert it to an assert just to be safe.)
delta would be changed after xfs_resizefs_init_new_ags() (the original
growfs design is that, I don't want to touch the original logic). that
is why `delta' reflects the last AG delta now...
I've never liked how the meaning of "delta" changes through the
function, and it clearly trips up reviewers.  This variable isn't the
delta between the old dblocks and the new dblocks, it's really a
resizefs cursor that tells us how much work we still have to do.
quoted
quoted
quoted
-		error = xfs_ag_extend_space(mp, tp, &id, delta);
+		if (extend) {
+			error = xfs_ag_extend_space(mp, tp, &id, delta);
+		} else {
+			id.agno = nagcount - 1;
+			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
pass in agno for now..?
Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
it seems more natural to pass in &id rather than agno (since
id.agno = nagcount - 1 will be stayed in some new helper
e.g. xfs_shrink_ags())
@id is struct aghdr_init_data, but shrinking shouldn't initialize any AG
headers.  Are you planning to make use of it in shrink, either now or
later on?
quoted
quoted
quoted
+		}
+
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -137,15 +157,15 @@ xfs_growfs_data_private(
 	 */
 	if (nagcount > oagcount)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
-	if (nb > mp->m_sb.sb_dblocks)
+	if (nb != mp->m_sb.sb_dblocks)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
 				 nb - mp->m_sb.sb_dblocks);
Maybe use delta here?
The reason is the same as above, `delta' here was changed due to 
xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
anymore. so `extend` boolean is used (rather than just use delta > 0)
Long question:

The reason why we use (nb - dblocks) is because growfs is an all or
nothing operation -- either we succeed in writing new empty AGs and
inflating the (former) last AG of the fs, or we don't do anything at
all.  We don't allow partial growing; if we did, then delta would be
relevant here.  I think we get away with not needing to run transactions
for each AG because those new AGs are inaccessible until we commit the
new agcount/dblocks, right?

In your design for the fs shrinker, do you anticipate being able to
eliminate all the eligible AGs in a single transaction?  Or do you
envision only tackling one AG at a time?  And can we be partially
successful with a shrink?  e.g. we succeed at eliminating the last AG,
but then the one before that isn't empty and so we bail out, but by that
point we did actually make the fs a little bit smaller.

There's this comment at the bottom of xfs_growfs_data() that says that
we can return error codes if the secondary sb update fails, even if the
new size is already live.  This convinces me that it's always been the
case that callers of the growfs ioctl are supposed to re-query the fs
geometry afterwards to find out if the fs size changed, even if the
ioctl itself returns an error... which implies that partial grow/shrink
are a possibility.
And of course I got so buried in building up to my long question that I
forgot to ask it:

If the design of the shrinker requires incremental shrinking, should we
support incremental growfs too?

--D
quoted
quoted
quoted
 	if (id.nfree)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
 
id.nfree tracks newly added free space in the growfs space. Is it not
used in the shrink case because the allocation handles this for us?
Yeah, I'm afraid so. This is some common code, and also used in my
shrinking the whole AGs patchset.
quoted
quoted
 	/*
-	 * update in-core counters now to reflect the real numbers
-	 * (especially sb_fdblocks)
+	 * update in-core counters now to reflect the real numbers (especially
+	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
 	 */
 	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
 		xfs_log_sb(tp);
@@ -165,7 +185,7 @@ xfs_growfs_data_private(
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
 	 */
-	if (delta) {
+	if (extend && delta) {
 		struct xfs_perag	*pag;
 
 		pag = xfs_perag_get(mp, id.agno);
We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
this function. xfs_ag_shrink_space() from the previous patch is intended
to deal with perag reservation changes for shrink, but it looks like the
reserve call further down could potentially reset mp->m_finobt_nores to
false if it previously might have been set to true.
Yeah, if my understanding is correct, I might need to call
xfs_fs_reserve_ag_blocks() only for growfs case as well for
mp->m_finobt_nores = true case.
I suppose it's worth trying in the finobt_nores==true case. :)

--D
quoted
Thanks,
Gao Xiang
quoted
Brian
quoted
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index e72730f85af1..fd2cbf414b80 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -419,7 +419,6 @@ xfs_trans_mod_sb(
 		tp->t_res_frextents_delta += delta;
 		break;
 	case XFS_TRANS_SB_DBLOCKS:
-		ASSERT(delta > 0);
 		tp->t_dblocks_delta += delta;
 		break;
 	case XFS_TRANS_SB_AGCOUNT:
-- 
2.27.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help