Re: [PATCH v5 4/8] md/r5cache: write part of r5cache
From: Song Liu <hidden>
Date: 2016-10-21 20:42:08
On Oct 18, 2016, at 5:53 PM, NeilBrown [off-list ref] wrote: On Fri, Oct 14 2016, Song Liu wrote:quoted
quoted
On Oct 1st of the work.Or we can force the code to rcw, which does not need extra page. But rcw, does not always work in degraded mode. So, this is a good reason not to do write-back in degraded mode...Prohibiting write-back in degraded mode would not be enough to ensure that you can always use rcw. The array can become degraded after you make the decision to use caching, and before to need to read old data for rmw. I would suggest a small (2 entry?) mempool where each entry in the mempool holds enough pages to complete an rmw. Only use the mempool if an alloc_page() fails.
This is a great idea. I will try it.
quoted
quoted
quoted
+ +void r5c_handle_cached_data_endio(struct r5conf *conf, + struct stripe_head *sh, int disks, struct bio_list *return_bi) +{ + int i; + + for (i = sh->disks; i--; ) { + if (test_bit(R5_InCache, &sh->dev[i].flags) && + sh->dev[i].written) {Is it possible for R5_InCache to be set, but 'written' to be NULL ???Yes, it is possible. A stripe may go through "write data to journal, return IO" multiple times before parity calculation. When it comes here the second time, dev written in the first time will have R5_InCache set, but its written will be NULL.OK, that makes sense. So is it possible for 'written' to be set, but R5_InCache to be clear? i.e. do we really need to test R5_InCache here?
In current implementation, there is only one log_io per sh, so we should be fine just check 'written'. Let me double check.
quoted
quoted
quoted
static void r5l_io_run_stripes(struct r5l_io_unit *io)@@ -483,7 +566,8 @@ static int r5l_log_stripe(struct r5l_log *log, struct stripe_head *sh,io = log->current_io; for (i = 0; i < sh->disks; i++) { - if (!test_bit(R5_Wantwrite, &sh->dev[i].flags)) + if (!test_bit(R5_Wantwrite, &sh->dev[i].flags) && + !test_bit(R5_Wantcache, &sh->dev[i].flags)) continue;If changed R5_Wantcache to R5_Wantjournal, and always set it on blocks that were destined for the journal, then this would just be if (!test_bit(R5_Wantjournal, &sh->dev[i].flags)) which would make lots of sense... Just a thought.We set R5_Wantwrite in multiple places. If we simplify the code here, we will need to make those places aware of journal. I guess that is not ideal either?Maybe... We have so many state flags that I like to be very cautious about adding more, and to make sure they have a very well defined meaning that doesn't overlap with other flags too much. The above code suggests that Wantwrite and Wantcache overlap to some extent. Could we discard Wantcache and just use Wantwrite combined with InCache? Wantwrite means that the block needed to be written to the RAID. If InCache isn't set, it also needs to be written to the cache (if the cache is being used). Then the above code would be if (!test_bit(R5_Wantwrite) || test_bit(R5_InCache)) continue; which means "if we don't want to write this, or if it is already in the cache, then nothing to do here". Maybe.
I am not sure whether we can totally remove Wantcache. Let me try it. Thanks, Song