Re: RAID creation resync behaviors
From: David Brown <hidden>
Date: 2017-05-05 06:46:51
On 04/05/17 23:57, NeilBrown wrote:
On Thu, May 04 2017, David Brown wrote:quoted
I have another couple of questions that might be relevant, but I am really not sure about the correct answers. First, if you have a stripe that you know is unused - it has not been written to since the array was created - could the raid layer safely return all zeros if an attempt was made to read the stripe?"know is unused" and "it has not been written to since the array was created" are not necessarily the same thing. If I have some devices which used to have a RAID5 array but for which the metadata got destroyed, I might carefully "create" a RAID5 over the devices and then have access to my data. This has been done more than once - it is not just theoretical.
That is true, of course - anything like this would have to be optional (command line switches in mdadm, for example). There is also the opposite situation - when you /have/ had something written to the array, but now you know it is unused (due to a trim). Knowing the stripe is unused might make a later partial write a little faster, and it would certainly speed up a scrub or other consistency check since unused stripes can be skipped.
But if you really "know" it is unused, then returning zeros should be fine.quoted
Second, when syncing an unused stripe (such as during creation), rather than reading the old data and copying it or generating parities, could we simply write all zeros to all the blocks in the stripes? For many SSDs, this is very efficient.If you were happy to destroy whatever was there before (see above recovery example for when you wouldn't), then it might be possible to make this work.
As above, this would have to be option-controlled. (I have had occasion to pull disks from one dead server to recover them on another machine - it's nerve-racking enough at the best of times, without fearing that you will zero out your remaining good disks!)
You would need to be careful not to write zeros over a region that the filesystem has already used.
Yes, but that should not be a difficult problem - the array is created before the filesystem.
That means you either disable all writes until the initialization completes (waste of time), or you add complexity to track which strips have been written and which haven't, and only initialise strips that have not been written. This complexity would only be used once in the entire life of the RAID. That might not be best use of resources.
I am not sure I see how this would be a problem. But it is something that would need to be considered carefully when looking at details of implementing these ideas (if anyone thinks they would be worth implementing). mvh., David