Thread (26 messages) 26 messages, 9 authors, 2023-01-16

Re: [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata

From: Zygo Blaxell <hidden>
Date: 2021-01-20 16:23:20

On Wed, Jan 20, 2021 at 11:02:41AM -0500, Josef Bacik wrote:
On 1/17/21 1:54 PM, Goffredo Baroncelli wrote:
quoted
Hi all,

This is an RFC; I wrote this patch because I find the idea interesting
even though it adds more complication to the chunk allocator.

The basic idea is to store the metadata chunk in the fasters disks.
The fasters disk are marked by the "preferred_metadata" flag.

BTRFS when allocate a new metadata/system chunk, selects the
"preferred_metadata" disks, otherwise it selectes the non
"preferred_metadata" disks. The intial patch allowed to use the other
kind of disk in case a set is full.

This patches set is based on v5.11-rc2.

For now, the only user of this patch that I am aware is Zygo.
However he asked to further constraint the allocation: i.e. avoid to
allocated metadata on a not "preferred_metadata"
disk. So I extended the patch adding 4 modes to operate.

This is enabled passing the option "preferred_metadata=<mode>" at
mount time.
I'll echo Zygo's hatred for mount options.  The more complicated policy
decisions belong in properties and sysfs knobs, not mount options.
Well, I hated it for two distinct reasons.  Only one of those was the
mount option.

The feature is really a per-_device_ policy:  whether we put data or
metadata or both (*) on the device, and in what order we fill devices
with each.

There's nothing filesystem-wide about the feature, other than we might
want to ensure there are enough devices to do allocations with our
raid profiles (e.g. if there are 2 data-only disks and raid1 data,
don't allow one of those disks to become metadata-only).

I had considered a couple of other ideas but dropped them:

	- keep a filesystem-wide "off" feature.  If that makes sense at
	all, it would be as a mount option, e.g. if you had painted your
	filesystem into a corner with data-only devices and didn't have
	enough metadata space to do a device property set to fix it.
	I dropped the idea because we don't have an existing feature
	for any other scenario where that happens (e.g. adding one disk
	to a full raid1 array) and it should be possible to recover
	immediately after the next mount using reserved metadata space.

	- for each device, specify a priority for metadata and data
	separately, with the lowest priority meaning "never use this
	device for that kind of chunk".  That is a more general form
	of the 4-device-type form (metadata only, metadata preferred,
	data preferred, data only), but all of the additional device
	orders that arbitrary sorting order permits will tend to make
	data and metadata fight with each other, to the point that the
	ordering isn't useful.  On the other hand, maybe this idea does
	have a future as a kind of advanced "move the data where I tell
	you to" balancing tool for complicated RAID array reshapes.
And then for the properties themselves, presumably we'll want to add other
FS wide properties in the future.  I'm not against adding new actual keys
and items to the tree itself, but is there a way we could use our existing
property infrastructure that we use for compression, and simply store the
xattrs in the tree root?  It looks like we're just toggling a policy
decision, and we don't actually need the other properties in the item you've
created, so why not just a btrfs.preferred_metadata property with the value
stored in it, dropped into the tree_root so it can be read on mount?
Thanks,

Josef
(*) I suppose logically there could be a "neither" option, but I'm not
sure what it would be for.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help