Re: [PATCH v2] btrfs: add goto in btrfs_defrag_file for error handling

From: David Sterba <hidden>
Date: 2021-05-17 13:03:10

On Wed, May 05, 2021 at 03:40:52PM -0700, Boris Burkov wrote:

On Wed, May 05, 2021 at 09:26:28AM +0800, Tian Tao wrote:

quoted

ret is assigned -EAGAIN at line 1455 and then reassigned defrag_count
at line 1547 after exiting the while loop.this causes the
btrfs_defrag_file function to not return the correct value in the event
of an error, this patch fixed this issue.

This looks like a correct fix, in that it locally improves what it
claims to improve. However, I have some questions about the style and
consistency of the function as a whole as a result. I think Dave had
a similar comment in his very first reply on v1.

The loop has the following early exit points:
fs unmounted
cancellation
swapfile/error in cluster_pages_for_defrag
newer_off == (u64)-1
error (ENOMEM or ENOENT) in find_new_extents

To me, it is confusing that of all these, only cancellation goes to a
label called "error". I would expect at least the swapfile/cluster case
to also jump to error. find_new_extents is interesting, because ENOENT
could be semantically special as an error and warrant a break rather
than a goto error, so we ought to figure that out correctly too.

If there is some good reason that only cancellation should receive this
treatment, and that some early exit cases should break or goto out_ra
then I would at least name the new label "cancel" and write a comment or
a note in the git commit explaining the difference.

The naming convention of the exit labels describes what happens at the
label point and not the reason, as the label can be targeted from
various branches but the same clanup is done. The naming is not
consistent everywhere, but at least that's the idea.

Thinking out loud, I suspect a way to really fix this messy function is
to do something like lift the contents of the while loop into a helper
function which returns the next incremental defrag_count, an error, or 0
for done.

Reading it again with the above in mind, there are two types of errors
to end the defrag:

- if some defrag work has been done but not entire file was processed
- the rest, eg. some hard errors

In the first case the optional flushing should still happen. In both
cases the incompat bits should be set -- this is now missing.

I'm not sure if the whole while loop could be factored out, there's a
lot of shared state with the function. The different kinds of errors
would have to be reflected too but that's doable.

As this patch fixes the return value of canceled defrag, I'd take it as
is and address the other issues separately.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help