Re: [PATCH] btrfs: add stub argument to transaction API

From: David Sterba <hidden>
Date: 2021-10-19 14:54:43

On Mon, Oct 18, 2021 at 05:28:38PM -0400, Josef Bacik wrote:

On Mon, Oct 18, 2021 at 07:38:03PM +0200, David Sterba wrote:

quoted

Why the stub/context argument is needed: the NOFS protection is per call
site, so it must be set and reset in the caller thread, so any
allocations between btrfs_start_transaction and btrfs_end_transaction
are safe. We can't store it in the transaction handle, because it's not
passed everywhere, eg. to various helpers in btrfs and potentially in
other subsystems.

So the plan is to instead pass the tctx around everywhere to carry the flags?  I
thought the whole point of memalloc_nofs_save() is that we don't have to pass
gfp_t's around everywhere, it just knows what we're supposed to be doing?

Nothing needs to be passed around, it will be hidden inside the
transaction start/end, the only thing the caller needs to do is to
define the variable and pass it to to transaction start call. The NOFS
section will then apply to any calls until the transaction end call.

--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c

@@ -1232,6 +1232,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode)
        struct btrfs_path *path;
        struct btrfs_block_rsv *block_rsv;
        int ret;
+       DEFINE_TCTX(tctx);
 
        if (!delayed_node)
                return 0;

@@ -1244,7 +1245,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode)
        }
        mutex_unlock(&delayed_node->mutex);
 
-       trans = btrfs_join_transaction(delayed_node->root, NULL);
+       trans = btrfs_join_transaction(delayed_node->root, tctx);
        if (IS_ERR(trans)) {
                ret = PTR_ERR(trans);
                goto out;

@@ -1271,7 +1272,7 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode)
        btrfs_free_path(path);
        trans->block_rsv = block_rsv;
 trans_out:
-       btrfs_end_transaction(trans, NULL);
+       btrfs_end_transaction(trans, tctx);
        btrfs_btree_balance_dirty(fs_info);
 out:
        btrfs_release_delayed_node(delayed_node);
---

This is what needs to be done per caller.

So
the trans should be able to hold the flags since we only care about starting it
and restoring it, correct?  Or am I wrong and we do actually need to pass this
thing around?  In which case can't we still just save it in the trans handle,
and pass the u32 around where appropriate?  Thanks,

I had to dig in my memory why we can't store it in the transaction
handle, because this is naturally less intrusive. But it does not work.

There are two things:

1) In a function that starts/joins a transaction, the NOFS scope is from
   that call until the transaction end. This is caller-specific.
   Like in the example above, any allocation with GFP_KERNEL happening
   will be safe and not recurse back to btrfs.

2) Transaction handle is not caller-specific and is allocated when the
   transaction starts (ie. a new kmem_cache_alloc call is done). Any
   caller of transaction start will only increase the reference count.

So, on each refcount_inc, we'd need to store the nofs_flags, on each
refcount_dec to restore it.

It is possible that the NOFS context is also needed outside of the
start/end region, but it depends and has to be done case by case. For
the transcation start/end we're sure the NOFS must be set.

The NOFS scope should be also more fine-grained to where we know we
really need it, so it's not "GFP_NOFS everywhere" just implemented by
other means.

There is (and will be more) code adding assertions and additional checks
to verify the invariants. For anybody who's interested I can share the
WIP branches, but I haven't worked on it since 5.6.

Getting the API parameter extension would help to shorten the
start-again time, it's possible I'm missing something and it can be done
in an easier way but I don't see it right now.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help