Thread (21 messages) 21 messages, 11 authors, 2005-01-18

Re: RAID1 Corruption

From: Markus Gehring <hidden>
Date: 2005-01-17 19:42:10

Paul Clements wrote:
Hi,

Markus Gehring wrote:
quoted
I have a reproducable problem with corrupted data read from a 
RAID1-array.

Setup:
 HW:
  2 S-ATA-Disks (160GB each) -> /dev/md4 RAID1
  Promise S150 TX4 - Controller
  AMD Sempron 2200+

 SW:
  Fedora Core 3
  Kernel 2.6.10 unpatched
  Samba (for read/write-accesses)
  SW-Raid

Everything works fine with only one drive in the array. If the second is
synced up read accesses return corrupted data.

Interesting: If you remove again the second disk. The same files will be
 read correctly again (no matter if written while only one disk is in
the array or two are synced!)!

This makes it sound like bad data is getting written to the second disk 
during resync. Could you give more details about your test procedure (a 
script or list of steps that reproduces the problem would be great)?
1. Setup Array (mdadm -C /dev/md4 -l 1 -n 2 /dev/sdc1 /dev/sdd1)
2. ... resync running (as i can see with cat /proc/mdstat)
3. mke2fs /dev/md4
4. mount /dev/md4 /home2
5. Copy ~100M JPGs (~800k each) via samba to array (/home2/test1/)
6. See the JPGs all okay
7. after resync has finished: Copy same ~100M JPGs to array (/home2/test2)
8. See the JPGs (at least in /home2/test2... i didn't check them in 
..test1) damaged
9. remove one disk again (mdadm /dev/md4 -f /dev/sdd1
mdadm /dev/md4 -r /dev/sdd1 ... or ../dev/sdc1!!!)
10. see (from the Win Client) the JPGs in /home2/test2 okay again!

I don't think samba is the culprit, but just to be sure, is there any 
chance you could reproduce the problem without samba in the equation? 
(From what you say above, I assume all reads and writes are coming from 
a samba client of some sort?)
I did a quick test:
Copyied my test-JPG-dir from /home/test (where i can see the pics okay) 
to /home2/test9 and see the pics damaged. After i copied them back to 
/home/test9 the stay damaged.

Remarks:
I also saw here that the pics on the syncing /dev/md4 = /home2 are 
damaged (read?) while the drive is syncing (new compared to point 6 
above) but this happens definitly not so often as if the drive has 
finished syncing (saw this the first time while dealing with the problem 
for over 2 weeks now).
I have all mounts on SW-Raid1 arrays, but i have never seen problems 
with md0 (/boot), md1 (/), md2 (swap), md3 (/var).
I have seen ext3-fs errors also (see also Sven Andras's posting from 
today and 5.1.2005).

Many Thanks,
  Markus
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help