Re: [PATCH 2/2] btrfs: send: fix invalid commands for inodes with changed... | linux-btrfs

Re: [PATCH 2/2] btrfs: send: fix invalid commands for inodes with changed type but same gen

From: Filipe Manana <hidden>
Date: 2021-01-12 13:55:53

On Tue, Jan 12, 2021 at 1:10 PM Roman Anasal | BDSU
[off-list ref] wrote:

On Tue, 2021-01-12 at 11:27 +0000, Filipe Manana wrote:

quoted

You get all these issues because you are using an incremental send in
a way it's not supposed to.
You are using two subvolumes that are completely unrelated.

Yes, I am aware of that and know that this is not a designed for use
case.

quoted

My surprise here is that we actually allow a user to try that,
instead
of giving an error complaining that subvol1 and subvol2 aren't
snapshots of the same subvolume.

This could be one way of solving it. But this is already harder than it
sounds, since the same issue may also happen *even when* the subvolumes
share a parent/are related, here an example reproducer:

  btrfs subvolume create subvol1
  btrfs subvolume snapshot subvol1 subvol2
  mkdir subvol1/a
  echo foo > subvol2/a
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

This will produce a stream that tries to write data into the cloned
directory inode.

Indeed, it will.

That use case is a bit different however, the snapshot starts RW, is
changed and then turned RO and used for an incremental send.

But the idea of an incremental send was always to use snapshots from
the same subvolume and the snapshots were always RO - once created you
didn't change them.
Otherwise you can never ensure full consistency.

So in the end it's not that much different from the case of using
unrelated subvolumes or snapshots as in the patch's changelog.

So we not only had to check the parent relationships but also ensure
that the descendant snapshot was not modified, i.e. was never set to
ro=false.
As far as I know there is no flag tracking this on-disk? So this would
need additional work, too.

There isn't.

Maybe all we need for now is to document how the incremental send is
supposed to be used.
Adding a flag to mark if a snapshot was ever RW shouldn't be too hard.

quoted

An incremental is supposed to work on snapshots of the same subvolume
(hence, have the same parent uuid).
That's what the entire code relies on to work correctly, and that's
what makes sense - to compute and send the differences between two
points in time of a subvolume.
That's why the code base assumes that inodes with the same number and
same generation must refer to the same inode in both the parent and
the send root.

What I think that needs to be answered is:

1) Are there actually people using incremental sends in that way?
(It's the first time I see such use case)

Well, I did (:
But admittedly in a kind of experimental setup testing out the limits
of btrfs.

quoted

2) If so, why? That is completely unreliable, not only it can lead to
failure to apply the streams, but can result in all sorts of
weirdness
(logical inconsistencies, etc) if applying such streams doesn't cause
an error.

In my case I was moving around subvolumes between multiple disks and
deduplicating as much as possible. Trying to preserve already done
deduplication and purging some intermediate subvolumes I ended up using
"unrelated" subvolumes as parents.

Thinking of it, maybe just using them as clone sources would have just
worked? That would then of course produce much larger streams since
*all* meta data had to be transfered.

quoted

3) Making sure such use cases work reliably would require many, many
changes to the send implementation, as it goes against what it
currently expects.
    Snapshot a subvolume, change the subvolume, snapshot it again,
then use both snapshots for the incremental send, that's the expected
scenario.

I actually don't think that it is really that much work since besides
from some edge cases it already *does* work - I tried ;)

quoted

In other words, what I think we should have is a check that forbids
using two roots for an incremental send that are not snapshots of the
same subvolume (have different parent uuids).

I'd like to argue against that:
   1. I don't think allowing this requires that much work

For the cases you tried it surely didn't.

   2. explicitly forbiding it requires work, too (and maybe even changes
      to the on-disk format?)
   3. fixing bugs for this unexpected use case will probably also fix bugs
      for the expected scenario which may only happen in very rare and
      extremly unlikely - though still possible - cases and thus make the
      code overall more resilient:

I'd have to disagree with that.

Having pretty much being the only one, apart from Robbie Ko who did
solve some hard similar problems,
solving this kind of problems since 2013, it's far from trivial work
and there's always one more case to solve.

Going through the change logs and send specific fstests should give an
idea of why I don't think it's that trivial to add support for these
use cases.

My concern is that this will require a lot more work for a use case
that is not standard, it was not designed for,
and this always adds the risk of introducing regressions for the
expected and typical use cases.

So I really don't find it compelling to add support for cases that
send was designed for - unless there are indeed users for them and
there is a good reason why they can't use the standard way (use
snapshots of the same subvolume and never modify the snapshots).

For example the assumption that inodes with the same number must refer
to the same inode doesn't even hold for direct snapshots - almost
always it does but in some rare conditions it doesn't, which is why
there already is an additional check for that.

The assumptions everywhere take into account inode number and
generation, not just the number.
Any place that uses only the number to check if it's the same inode,
then it's almost certainly a bug (a notable exception would be the
pending directory renames iirc).

This already caused bugs before. And the bug I hit with my unexpected
use of btrfs-send and originally set out to find you had just fixed a
few weeks before [1], i.e. if you hadn't fixed it I would - because of
the unexpected use.
The bug my patch fixes I only discovered by reading the code and having
my scenario in mind.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?id=0b3f407e6728d990ae1630a02c7b952c21c288d3

That's why I think the work fixing arising bugs for that case is better
invested than in trying to just block it completely.
But given its very rare occurence I would agree to not put any priority
on it.

Well, I'm sure there were plenty of bug fixes for the intended use
cases that also fixed failures with the non-intended use cases.

Thanks.

quoted
Thanks.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help