Re: RAID1 repair GPF crash w/3.10-rc7

From: Joe Lawrence <hidden>
Date: 2013-07-03 21:49:51

On Mon, 1 Jul 2013, Joe Lawrence wrote:

Hi Kent & Neil,

I've hit a crash in MD during RAID1 repair while running 3.10-rc7:

[ ... snip ... ]

Hi Neil,

Looking through the MD source, I'm trying to understand part of the
RAID1 repair path.  I came up with a few questions:

1 - During user initiated RAID1 repair, is the loop at the bottom of
sync_request(), under the bio_full label, responsible for submitting all
of the initial read bios?

2 - Does process_checks() later find the first uptodate read bio and
copy its data into the other r1_bio->bios[] for write repair to the
other disks?

If both are true, then perhaps the following applies to this crash...

Comments in commit f79ea416 "block: Refactor blk_update_request()" msg
include:

    Note that req_bio_endio() now always calls bio_advance() - which
    means it always loops over the biovec, not just on partial
    completions.  Don't expect it to affect performance, but worth
    noting.

Now that process_checks() has been further modified for immutable bio
prep (commit d3b45c2 "raid1: use bio_copy_data()"), it calls
bio_copy_data() to fill in the write repair bios... which starts
indexing the bi_bio_vec[] from wherever bi_idx happens to be.

If this is indeed the case, I'm having trouble coming up with a good
solution:

  - Immutable bios means drivers don't touch bi_idx.  So MD shouldn't
    "re-wind" the source bi_idx before calling bio_copy_data().

  - bio_copy_data() could copy the entire source bi_bio_vec[], as MD had
    done in the past, but that is that safe?  (ie, can we map bio
    vectors once they have been iterated over?)

Thanks,

-- Joe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help