Re: [PATCH] btrfs: add stub argument to transaction API
From: David Sterba <hidden>
Date: 2021-10-19 14:54:43
On Mon, Oct 18, 2021 at 05:28:38PM -0400, Josef Bacik wrote:
On Mon, Oct 18, 2021 at 07:38:03PM +0200, David Sterba wrote:quoted
Why the stub/context argument is needed: the NOFS protection is per call site, so it must be set and reset in the caller thread, so any allocations between btrfs_start_transaction and btrfs_end_transaction are safe. We can't store it in the transaction handle, because it's not passed everywhere, eg. to various helpers in btrfs and potentially in other subsystems.So the plan is to instead pass the tctx around everywhere to carry the flags? I thought the whole point of memalloc_nofs_save() is that we don't have to pass gfp_t's around everywhere, it just knows what we're supposed to be doing?
Nothing needs to be passed around, it will be hidden inside the transaction start/end, the only thing the caller needs to do is to define the variable and pass it to to transaction start call. The NOFS section will then apply to any calls until the transaction end call.
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c@@ -1232,6 +1232,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode) struct btrfs_path *path; struct btrfs_block_rsv *block_rsv; int ret; + DEFINE_TCTX(tctx); if (!delayed_node) return 0;
@@ -1244,7 +1245,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode) } mutex_unlock(&delayed_node->mutex); - trans = btrfs_join_transaction(delayed_node->root, NULL); + trans = btrfs_join_transaction(delayed_node->root, tctx); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto out;
@@ -1271,7 +1272,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode) btrfs_free_path(path); trans->block_rsv = block_rsv; trans_out: - btrfs_end_transaction(trans, NULL); + btrfs_end_transaction(trans, tctx); btrfs_btree_balance_dirty(fs_info); out: btrfs_release_delayed_node(delayed_node); ---
This is what needs to be done per caller.
So the trans should be able to hold the flags since we only care about starting it and restoring it, correct? Or am I wrong and we do actually need to pass this thing around? In which case can't we still just save it in the trans handle, and pass the u32 around where appropriate? Thanks,
I had to dig in my memory why we can't store it in the transaction handle, because this is naturally less intrusive. But it does not work. There are two things: 1) In a function that starts/joins a transaction, the NOFS scope is from that call until the transaction end. This is caller-specific. Like in the example above, any allocation with GFP_KERNEL happening will be safe and not recurse back to btrfs. 2) Transaction handle is not caller-specific and is allocated when the transaction starts (ie. a new kmem_cache_alloc call is done). Any caller of transaction start will only increase the reference count. So, on each refcount_inc, we'd need to store the nofs_flags, on each refcount_dec to restore it. It is possible that the NOFS context is also needed outside of the start/end region, but it depends and has to be done case by case. For the transcation start/end we're sure the NOFS must be set. The NOFS scope should be also more fine-grained to where we know we really need it, so it's not "GFP_NOFS everywhere" just implemented by other means. There is (and will be more) code adding assertions and additional checks to verify the invariants. For anybody who's interested I can share the WIP branches, but I haven't worked on it since 5.6. Getting the API parameter extension would help to shorten the start-again time, it's possible I'm missing something and it can be done in an easier way but I don't see it right now.