Thread (11 messages) 11 messages, 2 authors, 2021-02-03

Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen

From: Roman Anasal | BDSU <hidden>
Date: 2021-02-03 16:22:02

On Tue, 2021-02-02 at 11:56 +0000, Filipe Manana wrote:
On Sun, Jan 31, 2021 at 3:52 PM Roman Anasal | BDSU
[off-list ref] wrote:
quoted
On Mon, Jan 25, 2021 at 20:51 +0000 Filipe Manana wrote:
quoted
On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <
roman.anasal@bdsu.de>
wrote:
quoted
Second example:
  # case 2: same ino at different path
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  mknod subvol1/a c 1 3
  mknod subvol2/b c 1 5
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump
As I've told you before for the v1 patchset from a week or two
ago,
this is not a supported scenario for incremental sends.
Incremental sends are meant to be used on RO snapshots of the
same
subvolume, and those snapshots must never be changed after they
were
created.

Incremental sends were simply not designed for these cases, and
can
never be guaranteed to work with such cases.

The bug is not having incremental sends fail right away, with an
explicit error message, when the send and parent roots aren't RO
snapshots of the same subvolume.
Since this should be fixed then I'd like to propose to add the
following check:

The inodes of the subvolumes' root directories (ino
BTRFS_FIRST_FREE_OBJECTID = 256) must have the same generation.

Since create_subvol() will always commit the transaction, i.e.
increment the generation, no two _independently_ created subvolumes
can
be created within the same generation (are there race conditions
possible here?).
That is currently true, but it has been discussed and proposed the
ability to skip the transaction commit when creating a subvolume
Boris sent a proposal patch for that a few months ago.
Ah, okay then, if this may change in the future then this idea isn't
safe and should be dismissed.

I don't think that should be assumed. Avoiding the transaction
commit,
either by default or optionally, is something that makes sense.
Plus for a case like snapshots, we can actually batch the creation of
several ones in a single transaction.
quoted
Taking a snapshot of a subvolume does not modify the generation of
the
root dir inode. Also it is not possible to change or delete/re-
create
the root directory of a subvolume since this would delete the
subvolume
itself.


So having two subvolumes with root directories created with
different
generations means they were created independently and can not share
a
common ancestor. Doing an incremental send with them is unsafe and
thus
must return an error.
With the root directories at the same generation though the
subvolumes
are based on a common ancestor which is a requirement for a safe
incremental send.

Are my assumptions and my understanding here correct? Then this
check
would catch most of the unsafe parents.
If so I could have a shot at a patch for this if you'd like me to?
That is too complex and makes too many assumptions.

To check if two roots are snapshots of the same subvolume (the send
and parent roots), you can simply check if they have non-null uuids
in
the "parent_uuid" field of their root items and that they match.
I thought of this, too, but see it break in some scenarios I'd expect
it to work, mostly with "chains" of snapshots as they happen on a
receiving side.

Consider this scenario:

   btrfs subvolume create /subvol/
   # modify /subvol
   btrfs subvolume snapshot -r /subvol/ /snapshots/snap1
   # modify /subvol
   btrfs subvolume snapshot -r /subvol/ /snapshots/snap2
   # modify /subvol
btrfs subvolume snapshot -r /subvol/ /snapshots/snap3

I.e. have a single RW subvolume and taking incremental snapshots of it.

   cd /snapshots/
   btrfs send snap1 | btrfs receive /mnt/backups/
btrfs send -p snap1 snap2 | btrfs receive /mnt/backups/   btrfs send -p snap2 snap3 | btrfs receive /mnt/backups/

I.e. incrementally send the snapshots to another btrfs volume.

   cd /mnt/backups
   btrfs subvolume delete snap2
   btrfs send snap1 | btrfs receive /mnt/backups2/
   btrfs send -p snap1 snap3 | btrfs receive /mnt/backups2/

I.e. delete the intermediate snapshot snap2 and incrementally send
snap1 and snap3 from the receiving filesystem to yet another btrfs
filesystem.

The last command would fail since snap3 was based on snap2 which was
based on snap1; so neither is snap1 the (direct) parent of snap3 nor do
they share a common (direct) parent nor would it be possible to
reconstruct their relation by walking the chain since snap2 does no
longer exist.

While on the orignal filesystem all snapshots have the same parent on
the reciving volume it is a chain:

orignal volume:

        subvolume
        ^   ^   ^
       /    |    \
   snap1  snap2  snap3

receiving volume:

   snap1 <- snap2 <- snap3


So for this to work it would probably require another attribute
"original subvol UUID" for the root of the ancestry tree...

While this is more straightforward to do in the kernel, I would
prefer
to have it in btrfs-progs, because:

1) In btrfs-progs we can explicitly print an informative error
message
to the user, while in the kernel you can only return an errno value
and log something dmesg/syslog, which is much less user friendly;
I was thinking about implementing it in the kernel as an (additional)
check to block unsafe sends regardless of the user space tool (are
there any besides btrfs-progs?); but proper handling and an explaining
error message must be imlpemented in btrfs-progs, totally.

2) The check would be on by default but could be skipped with some
new
flag - this is just being conservative to avoid breaking any existing
workflows we might not be aware of.
    In particular I'm thinking about people using "btrfs send" with
-c
and omitting -p, in which case btrfs-progs selects one of the -c
roots
to be used as the parent root,
    but the selected root might not be a snapshot of the same
subvolume as the send root.
    Then maybe one day that option to skip the check would be
removed,
after we are more sure no one is using or really needs such
workflows.
The way I read find_good_parent() it will only select a clone source as
parent if it is the parent subvolume of the send subvolume [1] or if
both have the same parent [2]?
Which makes sense since selecting an snapshot of an unrelated subvolume
would be unsafe.

[1] https://github.com/kdave/btrfs-
progs/blob/273380d98f4412ae8b0f35ad69debf682e48c6bd/cmds/send.c#L118
[2]
https://github.com/kdave/btrfs-
progs/blob/273380d98f4412ae8b0f35ad69debf682e48c6bd/cmds/send.c#L131
quoted
This check still does not solve the second edge case though, when
snapshots are modified afterwards and diverge independently form
one
another. For this I still see no good solution besides a new on-
disk
flag whether a snapshot was *ever* set to ro=false. But with that
I'm
not sure how to (not) inherit that flag in a safe way ...
I'm afraid there's nothing, codewise, to do about that case.

Setting some flag on the root to make it unusable for send in case it
was ever RW would break send in at least one way:

During a receive we create the root as RW, apply the send stream and
then change the root to RO.
After such change, it would mean we could not send the received
snapshot anymore. There's no way to make sure that only btrfs-receive
can do that, since anyone can use the ioctl.
Another case where allowing to switch to RW before send would be
desirable: make snapshot RW, delete files you don't need anymore, make
RO again, send to backup disk.
Only deleting files/inodes should even be safe now.

Perhaps all that needs to be done is to document this well in the man
pages and wiki in case it's not already there.
Yes. Since these are all very unlikely edge cases and reliably detecting them without false positives is hard, just explicitly documenting them is probably the best solution.
Thanks.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help