Re: Bitmap did not survive reboot
From: Doug Ledford <hidden>
Date: 2009-11-12 01:34:04
On 11/11/2009 07:23 PM, Leslie Rhorer wrote:
I guess I skimmed over the manual rather quickly back then, and I was dealing with serious RAID issues at the time, so I must have improperly inferred the man page to imply this in the section which says, "Note that if you add a bitmap stored in a file which is in a filesystem that is on the raid array being affected, the system will deadlock. The bitmap must be on a separate filesystem" to read something more like, "Note that if you add a bitmap ... the bitmap must be on a separate filesystem.
Understandable, and now corrected, so no biggie ;-)
quoted
the only limitation is that the bitmap must be small enough to fit in the reserved space around the superblock. It's in the case that you want to create some super huge, absolutely insanely fine grained bitmap that it must be done at raid device creation time and that's only so it can reserve sufficient space for the bitmap.How can I know how much space is available? I tried adding the internal bitmap without specifying anything, and it seems to have worked fine. When I created the bitmap in an external file (without specifying the size), it was around 100K, which seems rather small.
100k is a huge bitmap. For my 2.5TB array, and a bitmap chunk size of 32768KB, I get the entire in-memory bitmap in 24k (as I recall, the in-memory bitmap is larger than the on-disk bitmap as the on-disk bitmap only stores a dirty/clean bit per chunk where as the in-memory bitmap also includes a counter per chunk so it knows when all outstanding writes complete and it needs to transition to clean, but I could be mis-remembering that).
Both of these systems
use un-partitioned disks with XFS mounted directly on the RAID array. One
is a 7 drive RAID5 array on 1.5 TB disks and the other is a 10 drive RAID6
array on 1.0TB disks. Both are using a version 1.2 superblock. The only
thing which jumps out at me is --examine, but it doesn't seem to tell me
much:
RAID-Server:/usr/share/pyTivo# mdadm --examine /dev/sda
/dev/sda:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 5ff10d73:a096195f:7a646bba:a68986ca
Name : RAID-Server:0 (local to host RAID-Server)
Creation Time : Sat Apr 25 01:17:12 2009
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 1953524896 (931.51 GiB 1000.20 GB)
Array Size : 15628197888 (7452.11 GiB 8001.64 GB)
Used Dev Size : 1953524736 (931.51 GiB 1000.20 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
The above two items are what you need for both version 1.1 and 1.2
superblocks in order to figure things out. The data, aka the filesystem
itself, starts at the Data Offset which is 272 sectors. The superblock
itself is 8 sectors in from the front of the disk because you have
version 1.2 superblocks. So, 272 - 8 - size of the superblock, which is
only a sector or two, is how much internal space you have. So, in your
case, you have about 132k of space for the bitmap. Version 1.0
superblocks are a little different in that you need to know the actual
size of the device and you need the super offset and possibly the used
dev size. There will be free space between the end of the data and the
superblock (super offset - used dev size) and free space after the
superblock (actual dev size as given by fdisk (either the size of the
device itself on whole disk devices or the size of the partition you are
using) - super offset - size of superblock). I don't know which is used
by the bitmap, but I seem to recall the bitmap wants to be between the
superblock and the end of the data, so I think the used dev size and
super offset are the important numbers there.
You mentioned that you used the defaults when creating the bitmap.
That's likely to hurt your performance. The default bitmap chunk is too
small. I would redo it with a larger bitmap chunk. If you look in
/proc/mdstat, it should tell you the current bitmap chunk. Given that
you stream large sequential files, you could go with an insanely large
bitmap chunk and be fine. Something like 65536 or 131072 should be good.
--
Doug Ledford [off-list ref]
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
Attachments
- signature.asc [application/pgp-signature] 197 bytes