Re: [PATCH 4/9] raid5: log reclaim support
From: NeilBrown <hidden>
Date: 2015-08-12 03:50:08
On Wed, 5 Aug 2015 14:34:21 -0700 Shaohua Li [off-list ref] wrote:
On Wed, Aug 05, 2015 at 01:43:30PM +1000, NeilBrown wrote:quoted
On Wed, 29 Jul 2015 17:38:44 -0700 Shaohua Li [off-list ref] wrote:quoted
This is the reclaim support for raid5 log. A stripe write will have following steps: 1. reconstruct the stripe, read data/calculate parity. ops_run_io prepares to write data/parity to raid disks 2. hijack ops_run_io. stripe data/parity is appending to log disk 3. flush log disk cache 4. ops_run_io run again and do normal operation. stripe data/parity is written in raid array disks. raid core can return io to upper layer. 5. flush cache of all raid array disks 6. update super block 7. log disk space used by the stripe can be reused In practice, several stripes consist of an io_unit and we will batch several io_unit in different steps, but the whole process doesn't change. It's possible io return just after data/parity hit log disk, but then read IO will need read from log disk. For simplicity, IO return happens at step 4, where read IO can directly read from raid disks. Currently reclaim run every minute or out of space. Reclaim is just to free log disk spaces, it doesn't impact data consistency.Having arbitrary times lines "every minute" is a warning sign. "As soon as possible" and "Just it time" can both make sense easily. "every minute" needs more justification. I'll probably say more when I find the code.The idea is if we reclaim periodically, recovery could scan less log space. It's insane recovery scans a 1T disk. As I said this is just to free disk spaces. It's not a signal we will lose data in minute interval. I can change the relaim to run every 1G reclaimable space for example.
There seem to be two issues here and I might be confusing them. Firstly there is the question of when a stripe gets written back to the array. Once the data is safe in the log this doesn't have to happen in any great hurry, but I suspect it should still happen sooner rather than later. Presumably as soon as data/parity of a stripe is safe in the log, that stripe will be scheduled to be written to the array - is that correct? As these writes-to-the-array complete the counter in the io_unit will decrease. when it reaches zero the io_unit can be freed and the recovery_offset in the superblock can, potentially be updated. Secondly there is the question of how often the superblock is updated. As you say; delaying the updates indefinitely could lead to a recovery having to examine a very large part of the log - maybe more than necessary (though if that might be a problem, the simple solution is to use a smaller log). I would probably feel most comfortable scheduling a superblock update whenever the amount of log space that it would reclaim exceeds 1/4 of the log size. That should be often enough without imposing a completely arbitrary number. Make sense? thanks, NeilBrown