Thread (6 messages) 6 messages, 3 authors, 2010-10-18

Re: Problem regarding RAID10 on kernel 2.6.31

From: Neil Brown <hidden>
Date: 2010-08-06 10:14:35
Subsystem: software raid (multiple disks) support, the rest · Maintainers: Song Liu, Yu Kuai, Linus Torvalds

On Fri, 06 Aug 2010 15:11:58 +0530
ravichandra [off-list ref] wrote:
Hi everyone,
                 I  used 2 (1 TB disks) disks each with 3
partitions(sda[1-3] and sdb[1-3]).Using sda[1-2] and sdb[1-2] i have
created a RAID10 array say md2. Then  i was reading and writing to the
array and simultaneously removing a disk and adding it to the same
array. In the process i got a hang causing recovery process to halt. The
array was not operational after.These were done on kernel 2.6.31.

           I am working on the RAID10 for the first time. Can someone
help in this so that i can proceed further?? 

Thanks in advance.
Known problem.  I'll be submitting the fix upstream shortly.  I include it
below.
Thanks for the report
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 42e64e4..d1d6891 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 		 */
 		bp = bio_split(bio,
 			       chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+
+		/* Each of these 'make_request' calls will call 'wait_barrier'.
+		 * If the first succeeds but the second blocks due to the resync
+		 * thread raising the barrier, we will deadlock because the
+		 * IO to the underlying device will be queued in generic_make_request
+		 * and will never complete, so will never reduce nr_pending.
+		 * So increment nr_waiting here so no new raise_barriers will
+		 * succeed, and so the second wait_barrier cannot block.
+		 */
+		spin_lock_irq(&conf->resync_lock);
+		conf->nr_waiting++;
+		spin_unlock_irq(&conf->resync_lock);
+
 		if (make_request(mddev, &bp->bio1))
 			generic_make_request(&bp->bio1);
 		if (make_request(mddev, &bp->bio2))
 			generic_make_request(&bp->bio2);
 
+		spin_lock_irq(&conf->resync_lock);
+		conf->nr_waiting--;
+		wake_up(&conf->wait_barrier);
+		spin_unlock_irq(&conf->resync_lock);
+
 		bio_pair_release(bp);
 		return 0;
 	bad_map:
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help