Linux RAID Quickref
From NBSWiki
Contents |
Reference
Let's start with some Gentoo Docs...
NOTE: if a raid drive fails to be detected (md device cannot be mounted or "/dev/md??" cannot stat superblock message appears, don't panick and use the mdadm --assemble option.
RAID0
Just to use spare old drives as a huge fast drive
mknod /dev/md1 b 9 1 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=3 /dev/hdb1 /dev/hdd1 /dev/sda1 mkfs -j -O dir_index /dev/md1
RAID1
There is a nice howto here but here are the two basic commands to create and format a RAID 1 partition:
mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdc2 mkfs -j -O ^sparse_super,dir_index /dev/md0
RAID5 (quick copy-paste)
Quick notes on building a RAID5 based install assumption and highly recommended that the 4 drives be of the exact same model (partition alignment). We purposefully start MDs at 1 and not 0 so that all other instructions still hold (ie: sda1 == boot, well md1 is also boot):
lshw -C disk -short #get a list of the disks
Let's set some vars and make this more dynamic:
Da=/dev/sda #considered the master for [s]fdisk purposes Db=/dev/sdb Dc=/dev/sdc Dd=/dev/sdd
- Create usual 3 partitions 1:1M-256M,2:512M,3:rest_of_hdd NOTE that 2 is swap and will be RAID5 here (so more than 512M), the 1M gap is IMPORTANT for grub2's embedding
fdisk ${Da}
- nfortunately, the following DOES NOT work as sfdisk doesn't honor the 1MB gap specified on the first partition:
sfdisk --force -uM -L ${Da} <<-EOF
1,256,fd
,512,fd
,,fd
;
EOF
- Propagate the disk partitioning scheme and continue on...
for D in ${Db} ${Dc} ${Dd}; do sfdisk -d ${Da} | sfdisk ${D}; done #Copy to 3 other drives
modprobe raid5 #load software RAID5 module
modprobe raid1 #load software RAID1 module
for I in 1 2 3; do mknod /dev/md$I b 9 $I\n; done #Create software RAID device nodes
echo y | mdadm --create --verbose /dev/md1 --force --level=1 --raid-devices=4 ${Da}1 ${Db}1 ${Dc}1 ${Dd}1 #RAIDa for BOOT
sleep 1 # copy-paste is actually too fast ;)
echo y | mdadm --create --verbose /dev/md2 --force --level=5 --raid-devices=4 ${Da}2 ${Db}2 ${Dc}2 ${Dd}2 #RAID5 for SWAP
sleep 1 # copy-paste is actually too fast ;)
echo y | mdadm --create --verbose /dev/md3 --force --level=5 --raid-devices=4 ${Da}3 ${Db}3 ${Dc}3 ${Dd}3 #RAID5 for ROOT
mke2fs -L BOOT /dev/md1
mkswap /dev/md2
echo y | mkreiserfs -l ROOT /dev/md3
echo 200000 >/proc/sys/dev/raid/speed_limit_max # speed up the sync process 200MB/s...you must have good material!
If you make a mistake, you can reset the arrays with:
for I in 1 2 3; do mdadm --stop /dev/md$I; mdadm --remove /dev/md$I ; done
for D in ${Da} ${Db} ${Dc} ${Dd}; do dd bs=512 count=1 if=/dev/zero of=${D}; done
And start over for md1...for example. Now go onto installing Gentoo quickly.
RAID10
And now, we create a RAID10 (mirrored RAID0):
mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sd[ab]1 mdadm --create --verbose /dev/md3 --level=0 --raid-devices=2 /dev/sd[cd]1 mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/md2 /dev/md3 mkreiserfs /dev/md4
Apply to an installed OS
As per some suggestions on the Gentoo Forum (RAID1 after OS install and Possible to convert to RAID without a complete reinstall?), we'll assume device 1 is hda and device 2 is hdb (no comments please, physical restraints, so I couldn't put the second one as hdc) and that we want an ext3 FS:
Partitionning
Both partition on hda and hdb have to be of identical size. You can safely extract the partition information with fdisk -l /dev/hd[ab]. In our case, we have two disks of different size (80GB and 200GB). You can choose to keep extra space at the beginning of the drive to copy over the boot partition. Note that you will need physical access to the machine since you'll need to boot off a CD to complete the installation. Here is our disk information:
| Code: Getting partition information |
headless src # fdisk -l /dev/hd[ab] Disk /dev/hda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 31 248976 83 Linux /dev/hda2 32 275 1959930 82 Linux swap / Solaris /dev/hda3 276 9729 75939255 83 Linux Disk /dev/hdb: 200.0 GB, 200049647616 bytes 255 heads, 63 sectors/track, 24321 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/hdb doesn't contain a valid partition table |
Here we use sfdisk -d /dev/hda to perform an exact copy of the partition from hda to hdb. This a dangerous part, don't screw up!
| Code: Creating the partition on the new drive |
headless src # sfdisk -d /dev/hda | sfdisk /dev/hdb Checking that no-one is using this disk right now ... OK Disk /dev/hdb: 24321 cylinders, 255 heads, 63 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/hdb: unrecognized partition table type Old situation: No partitions found New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/hdb1 * 63 498014 497952 83 Linux /dev/hdb2 498015 4417874 3919860 82 Linux swap / Solaris /dev/hdb3 4417875 156296384 151878510 83 Linux /dev/hdb4 0 - 0 0 Empty Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).) |
Now that we have an exact copy, we have to change the partition type of hdb1 and hdb3 so that they are known as Id fd (Linux raid autodetect), use the tool you want, I found it easier to use fdisk /dev/hdb. you should end up with the following partition info:
| Code: Getting partition information (again) |
headless src # fdisk -l /dev/hdb Disk /dev/hdb: 200.0 GB, 200049647616 bytes 255 heads, 63 sectors/track, 24321 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdb1 * 1 31 248976 fd Linux raid autodetect /dev/hdb2 32 275 1959930 82 Linux swap / Solaris /dev/hdb3 276 9729 75939255 fd Linux raid autodetect |
Creating the degraded RAID1 partitions
Now we create and format the two degraded RAID1 partitions with the "missing" disk. If madm complains that /dev/md0 of /dev/md1 don't exist:
- Make sure you have the RAID modules loaded in your kernel
- Make the nodes manually with:
mknod /dev/md0 b 9 0 mknod /dev/md1 b 9 1
Now we create the RAID partitions:
mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 missing /dev/hdb1 mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 missing /dev/hdb3
...And format the new partitions:
mkfs -O ^sparse_super,dir_index /dev/md0 mkfs -j -O ^sparse_super,dir_index /dev/md1
Notice the missing part in place of /dev/hda3 in the first command.
Modifying GRUB and FSTAB
Simply edit the root device from /dev/hda3 (in my case) to /dev/md1 (again...my case). Perform the same changes on the RAID's fstab. Note that you will want to perform the GRUB modifications on BOTH disks for the first boot. Now, install GRUB on the second drive as usually done in the installation procedure (you want GRUB installed on both hd0 and hd1 for example). Modify both menus to be exactly the same and make sure the entries for each corresponding disks points at the right place. For example, on the RAID drive I have:
| Code: Editing grub.conf |
livecd ~ #cat /mnt/raid/boot/grub/grub.con # Which listing to boot as default. 0 is the first, 1 the second etc. default 0 # How many seconds to wait before the default listing is booted. timeout 2 # Nice, fat splash-image to spice things up :) # Comment out if you don't have a graphics card installed splashimage=(hd1,0)/grub/splash.xpm.gz title=Gentoo root (hd1,0) kernel /vmlinuz root=/dev/md1 title=Gentoo Previous root (hd1,0) kernel /vmlinuz.old root=/dev/md1 title=Gentoo orig root (hd1,0) kernel /vmlinuz root=/dev/hda |
| Code: /etc/fstab |
headless ~ # cat /etc/fstab |grep -ve\# -ve^$ /dev/md0 /boot ext2 noauto,noatime 1 2 /dev/md1 / ext3 noatime 0 1 /dev/hda2 none swap sw 0 0 /dev/hdb2 none swap sw 0 0 /dev/cdroms/cdrom0 /mnt/cdrom iso9660 noauto,ro 0 0 /usr/portage/distfiles /tftproot/AthlonXP/usr/portage/distfiles none bind,noatime,rw 0 0 /usr/local/sci /tftproot/AthlonXP/usr/local/sci none bind,noatime,rw 0 0 142.137.135.205:/DATA /mnt/Elvis/DATA nfs noatime 0 0 pythagore:/export/home /mnt/Pythagore nfs noatime 0 0 proc /proc proc defaults 0 0 shm /dev/shm tmpfs nodev,nosuid,noexec 0 0 |
Copying the data to the degraded disk
We then copy the existing data to the "degraded" mirror. This was done by booting on a LiveCD, guaranteeing that the data would not be modified while copying the system over to the degraded RAID1. First we must load the md modules on the LiveCD :
modprobe md modprobe raid1
And then create the md device entries (we make more than you might need):
for I in 0 1 2 3; do mknod /dev/md$I b 9 $I; done
Unfortunately, it would seem that we need to "recreate" the RAID drives with mdadm so...
mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 missing /dev/hdb1 mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 missing /dev/hdb3
For each mirrored partition, mount both the original and RAID version, copy the data over to the RAID partition. For example (the /boot partition):
mkdir /mnt/raid && mount /dev/hda1 /mnt/gentoo/ && mount /dev/md0 /mnt/raid && cd /mnt/gentoo/ && cp -a * /mnt/raid/
Merging the disks
Once you are able to boot off the md device, it's time to scrap the original disk (hda) and merge it to the new md device. Since we started by creating an exact copy of the orifinal device, there is not much to do to the hda partition table other than changing the type to fd:
headless ~ #fdisk -l /dev/hda Disk /dev/hda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 31 248976 fd Linux raid autodetect /dev/hda2 32 275 1959930 82 Linux swap / Solaris /dev/hda3 276 9729 75939255 fd Linux raid autodetect
We insert hda1 into the md0 device as follows:
mdadm /dev/md0 --add /dev/hda1
Now the same thing for the root partition:
mdadm /dev/md1 --add /dev/hda3
You can see the mirror's recreation process with cat /proc/mdstat
headless ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md2 : active raid0 sdd1[3] sdc1[2] sdb1[1] sda1[0]
624992256 blocks 64k chunks
md1 : active raid1 hda3[2] hdb3[1]
75939136 blocks [2/1] [_U]
[>....................] recovery = 0.2% (216576/75939136) finish=34.9min speed=36096K/sec
md0 : active raid1 hda1[0] hdb1[1]
248896 blocks [2/2] [UU]
unused devices: <none>
As you can see, md0 is now built with hda1[0] and hdb1[1], and md1 is built with hda3[2] and hdb3[1].
mkfs
Good measure for etx3 filesystems (extra superblocks and indexing of dirs):
mkfs -j -O ^sparse_super,dir_index /dev/xxx
And for an FS with HUGE files (man mke2fs):
-T fs-type Specify how the filesystem is going to be used, so that mke2fs can chose optimal filesystem parameters for that use. The supported filesystem types are: news one inode per 4kb block largefile one inode per megabyte largefile4 one inode per 4 megabytes
HDD I/O Benchmarking
Optimize I/Os with hdparm
Example:
hdparm -d 1 -m 16 -a 8128 /dev/i2o/hda
Some benchmarking utilities
Used to benchtest HDD performances.
tiobench iozone for client<->server performance. bonnie++ to test server<->storage performance.
Some crazy ass results
Using tiobench
On the following setup:
Software RAID0 with 4 identical SATA drives reported to have the following performance according to hdparm: headless ScratchPad # hdparm -Tt /dev/sd[abcd] /dev/sda: Timing cached reads: 2552 MB in 2.00 seconds = 1276.00 MB/sec Timing buffered disk reads: 180 MB in 3.02 seconds = 59.51 MB/sec /dev/sdb: Timing cached reads: 2548 MB in 2.00 seconds = 1274.52 MB/sec Timing buffered disk reads: 178 MB in 3.00 seconds = 59.33 MB/sec /dev/sdc: Timing cached reads: 2548 MB in 2.00 seconds = 1273.86 MB/sec Timing buffered disk reads: 178 MB in 3.01 seconds = 59.17 MB/sec /dev/sdd: Timing cached reads: 2564 MB in 2.00 seconds = 1281.16 MB/sec Timing buffered disk reads: 188 MB in 3.02 seconds = 62.30 MB/sec
These 4 drives are mounted as a RAID0 and are formated using XFS from kernel 2.6.16-ck9. tiobench returned the following stats:
headless ScratchPad # tiobench.pl --block 4096 --block 8192 --block 65535
No size specified, using 2000 MB
Run #1: /usr/sbin/tiotest -t 8 -f 250 -r 500 -b 65535 -d . -TTT
Unit information
================
File size = megabytes
Blk Size = bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
Sequential Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.16-ck9 2000 4096 1 88.92 25.87% 0.042 23.89 0.00000 0.00000 344
2.6.16-ck9 2000 4096 2 117.38 55.05% 0.054 138.95 0.00000 0.00000 213
2.6.16-ck9 2000 4096 4 152.73 130.1% 0.074 116.60 0.00000 0.00000 117
2.6.16-ck9 2000 4096 8 233.11 356.8% 0.075 197.46 0.00000 0.00000 65
2.6.16-ck9 2000 8192 1 90.93 20.77% 0.084 22.07 0.00000 0.00000 438
2.6.16-ck9 2000 8192 2 125.95 43.38% 0.098 172.37 0.00000 0.00000 290
2.6.16-ck9 2000 8192 4 174.31 88.02% 0.107 158.41 0.00000 0.00000 198
2.6.16-ck9 2000 8192 8 227.55 205.0% 0.161 253.12 0.00000 0.00000 111
2.6.16-ck9 2000 65535 1 100.87 21.53% 0.617 16.42 0.00000 0.00000 468
2.6.16-ck9 2000 65535 2 166.68 40.08% 0.549 202.89 0.00000 0.00000 416
2.6.16-ck9 2000 65535 4 218.20 82.91% 0.676 175.64 0.00000 0.00000 263
2.6.16-ck9 2000 65535 8 225.41 141.5% 1.054 363.01 0.00000 0.00000 159
Random Reads
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.16-ck9 2000 4096 1 13.57 6.947% 0.286 17.42 0.00000 0.00000 195
2.6.16-ck9 2000 4096 2 21.00 5.375% 0.353 29.84 0.00000 0.00000 391
2.6.16-ck9 2000 4096 4 26.37 54.00% 0.517 32.37 0.00000 0.00000 49
2.6.16-ck9 2000 4096 8 32.07 65.67% 0.853 50.78 0.00000 0.00000 49
2.6.16-ck9 2000 8192 1 30.49 2.926% 0.254 16.18 0.00000 0.00000 1042
2.6.16-ck9 2000 8192 2 39.35 12.59% 0.387 23.92 0.00000 0.00000 313
2.6.16-ck9 2000 8192 4 58.93 7.543% 0.413 30.21 0.00000 0.00000 781
2.6.16-ck9 2000 8192 8 60.54 58.11% 0.869 53.98 0.00000 0.00000 104
2.6.16-ck9 2000 65535 1 120.57 13.50% 0.516 15.79 0.00000 0.00000 893
2.6.16-ck9 2000 65535 2 144.87 30.71% 0.795 21.73 0.00000 0.00000 472
2.6.16-ck9 2000 65535 4 225.26 91.00% 0.985 32.01 0.00000 0.00000 248
2.6.16-ck9 2000 65535 8 154.88 118.9% 2.314 68.53 0.00000 0.00000 130
Sequential Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.16-ck9 2000 4096 1 58.71 18.43% 0.035 2207.10 0.00020 0.00000 318
2.6.16-ck9 2000 4096 2 59.06 41.99% 0.086 8878.79 0.00059 0.00000 141
2.6.16-ck9 2000 4096 4 59.17 85.05% 0.113 8520.31 0.00098 0.00000 70
2.6.16-ck9 2000 4096 8 59.28 170.8% 0.212 7899.14 0.00371 0.00000 35
2.6.16-ck9 2000 8192 1 58.32 18.54% 0.074 118.43 0.00000 0.00000 314
2.6.16-ck9 2000 8192 2 59.11 38.53% 0.136 4603.73 0.00117 0.00000 153
2.6.16-ck9 2000 8192 4 59.00 74.27% 0.272 9009.24 0.00234 0.00000 79
2.6.16-ck9 2000 8192 8 59.36 156.7% 0.460 10244.58 0.00742 0.00039 38
2.6.16-ck9 2000 65535 1 58.73 14.09% 0.594 84.02 0.00000 0.00000 417
2.6.16-ck9 2000 65535 2 59.44 27.96% 1.063 1579.43 0.00000 0.00000 213
2.6.16-ck9 2000 65535 4 59.47 59.26% 1.706 9836.32 0.01562 0.00000 100
2.6.16-ck9 2000 65535 8 59.42 115.3% 3.132 13360.27 0.04688 0.00313 52
Random Writes
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
2.6.16-ck9 2000 4096 1 7.96 2.547% 0.007 0.07 0.00000 0.00000 312
2.6.16-ck9 2000 4096 2 7.78 9.456% 0.012 6.50 0.00000 0.00000 82
2.6.16-ck9 2000 4096 4 8.28 14.83% 0.018 10.16 0.00000 0.00000 56
2.6.16-ck9 2000 4096 8 7.83 13.03% 0.021 12.44 0.00000 0.00000 60
2.6.16-ck9 2000 8192 1 12.60 12.09% 0.017 0.11 0.00000 0.00000 104
2.6.16-ck9 2000 8192 2 13.42 13.74% 0.017 0.11 0.00000 0.00000 98
2.6.16-ck9 2000 8192 4 13.81 37.12% 0.018 0.23 0.00000 0.00000 37
2.6.16-ck9 2000 8192 8 13.99 76.56% 0.039 40.05 0.00000 0.00000 18
2.6.16-ck9 2000 65535 1 49.07 14.13% 0.097 3.25 0.00000 0.00000 347
2.6.16-ck9 2000 65535 2 51.08 29.22% 0.106 9.02 0.00000 0.00000 175
2.6.16-ck9 2000 65535 4 51.22 43.84% 0.107 6.05 0.00000 0.00000 117
2.6.16-ck9 2000 65535 8 49.55 85.81% 0.255 151.17 0.00000 0.00000 58
