Linux RAID Quickref

From NBSWiki

Jump to: navigation, search

Contents

Reference

Let's start with some Gentoo Docs...

NOTE: if a raid drive fails to be detected (md device cannot be mounted or "/dev/md??" cannot stat superblock message appears, don't panick and use the mdadm --assemble option.

RAID0

Just to use spare old drives as a huge fast drive

mknod /dev/md1 b 9 1
mdadm --create --verbose /dev/md1 --level=0 --raid-devices=3 /dev/hdb1 /dev/hdd1 /dev/sda1
mkfs -j -O dir_index /dev/md1

RAID1

There is a nice howto here but here are the two basic commands to create and format a RAID 1 partition:

mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdc2
mkfs -j -O ^sparse_super,dir_index /dev/md0

RAID5 (quick copy-paste)

Quick notes on building a RAID5 based install assumption and highly recommended that the 4 drives be of the exact same model (partition alignment). We purposefully start MDs at 1 and not 0 so that all other instructions still hold (ie: sda1 == boot, well md1 is also boot):

lshw -C disk -short                  #get a list of the disks

Let's set some vars and make this more dynamic:

Da=/dev/sda #considered the master for [s]fdisk purposes
Db=/dev/sdb
Dc=/dev/sdc
Dd=/dev/sdd
  • Create usual 3 partitions 1:1M-256M,2:512M,3:rest_of_hdd NOTE that 2 is swap and will be RAID5 here (so more than 512M), the 1M gap is IMPORTANT for grub2's embedding
fdisk ${Da}
  • nfortunately, the following DOES NOT work as sfdisk doesn't honor the 1MB gap specified on the first partition:
sfdisk --force -uM -L ${Da} <<-EOF
1,256,fd
,512,fd
,,fd
;
EOF

  • Propagate the disk partitioning scheme and continue on...
for D in ${Db} ${Dc} ${Dd}; do sfdisk -d ${Da} | sfdisk ${D}; done #Copy to 3 other drives
modprobe raid5                       #load software RAID5 module
modprobe raid1                       #load software RAID1 module
for I in 1 2 3; do mknod /dev/md$I b 9 $I\n; done #Create software RAID device nodes
echo y | mdadm --create --verbose /dev/md1 --force --level=1 --raid-devices=4 ${Da}1 ${Db}1 ${Dc}1 ${Dd}1 #RAIDa for BOOT
sleep 1 # copy-paste is actually too fast ;)
echo y | mdadm --create --verbose /dev/md2 --force --level=5 --raid-devices=4 ${Da}2 ${Db}2 ${Dc}2 ${Dd}2 #RAID5 for SWAP
sleep 1 # copy-paste is actually too fast ;)
echo y | mdadm --create --verbose /dev/md3 --force --level=5 --raid-devices=4 ${Da}3 ${Db}3 ${Dc}3 ${Dd}3 #RAID5 for ROOT
mke2fs     -L BOOT /dev/md1
mkswap             /dev/md2
echo y | mkreiserfs -l ROOT /dev/md3
echo 200000 >/proc/sys/dev/raid/speed_limit_max # speed up the sync process 200MB/s...you must have good material!

If you make a mistake, you can reset the arrays with:

for I in 1 2 3; do mdadm --stop /dev/md$I; mdadm --remove /dev/md$I ; done
for D in ${Da} ${Db} ${Dc} ${Dd}; do dd bs=512 count=1 if=/dev/zero of=${D}; done

And start over for md1...for example. Now go onto installing Gentoo quickly.

RAID10

And now, we create a RAID10 (mirrored RAID0):

mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sd[ab]1
mdadm --create --verbose /dev/md3 --level=0 --raid-devices=2 /dev/sd[cd]1
mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/md2 /dev/md3
mkreiserfs /dev/md4

Apply to an installed OS

As per some suggestions on the Gentoo Forum (RAID1 after OS install and Possible to convert to RAID without a complete reinstall?), we'll assume device 1 is hda and device 2 is hdb (no comments please, physical restraints, so I couldn't put the second one as hdc) and that we want an ext3 FS:

Partitionning

Both partition on hda and hdb have to be of identical size. You can safely extract the partition information with fdisk -l /dev/hd[ab]. In our case, we have two disks of different size (80GB and 200GB). You can choose to keep extra space at the beginning of the drive to copy over the boot partition. Note that you will need physical access to the machine since you'll need to boot off a CD to complete the installation. Here is our disk information:

Code: Getting partition information
headless src # fdisk -l /dev/hd[ab]

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          31      248976   83  Linux
/dev/hda2              32         275     1959930   82  Linux swap / Solaris
/dev/hda3             276        9729    75939255   83  Linux

Disk /dev/hdb: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/hdb doesn't contain a valid partition table

Here we use sfdisk -d /dev/hda to perform an exact copy of the partition from hda to hdb. This a dangerous part, don't screw up!

Code: Creating the partition on the new drive
headless src # sfdisk -d /dev/hda | sfdisk /dev/hdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/hdb: 24321 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/hdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/hdb1   *        63    498014     497952  83  Linux
/dev/hdb2        498015   4417874    3919860  82  Linux swap / Solaris
/dev/hdb3       4417875 156296384  151878510  83  Linux
/dev/hdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

Now that we have an exact copy, we have to change the partition type of hdb1 and hdb3 so that they are known as Id fd (Linux raid autodetect), use the tool you want, I found it easier to use fdisk /dev/hdb. you should end up with the following partition info:

Code: Getting partition information (again)
headless src # fdisk -l /dev/hdb

Disk /dev/hdb: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1   *           1          31      248976   fd  Linux raid autodetect
/dev/hdb2              32         275     1959930   82  Linux swap / Solaris
/dev/hdb3             276        9729    75939255   fd  Linux raid autodetect

Creating the degraded RAID1 partitions

Now we create and format the two degraded RAID1 partitions with the "missing" disk. If madm complains that /dev/md0 of /dev/md1 don't exist:

  1. Make sure you have the RAID modules loaded in your kernel
  2. Make the nodes manually with:
mknod /dev/md0 b 9 0
mknod /dev/md1 b 9 1

Now we create the RAID partitions:

mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 missing /dev/hdb1
mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 missing /dev/hdb3

...And format the new partitions:

mkfs    -O ^sparse_super,dir_index /dev/md0
mkfs -j -O ^sparse_super,dir_index /dev/md1

Notice the missing part in place of /dev/hda3 in the first command.

Modifying GRUB and FSTAB

Simply edit the root device from /dev/hda3 (in my case) to /dev/md1 (again...my case). Perform the same changes on the RAID's fstab. Note that you will want to perform the GRUB modifications on BOTH disks for the first boot. Now, install GRUB on the second drive as usually done in the installation procedure (you want GRUB installed on both hd0 and hd1 for example). Modify both menus to be exactly the same and make sure the entries for each corresponding disks points at the right place. For example, on the RAID drive I have:

Code: Editing grub.conf
livecd ~ #cat /mnt/raid/boot/grub/grub.con
# Which listing to boot as default. 0 is the first, 1 the second etc.
default 0
# How many seconds to wait before the default listing is booted.
timeout 2
# Nice, fat splash-image to spice things up :)
# Comment out if you don't have a graphics card installed
splashimage=(hd1,0)/grub/splash.xpm.gz

title=Gentoo
root (hd1,0)
kernel /vmlinuz root=/dev/md1

title=Gentoo Previous
root (hd1,0)
kernel /vmlinuz.old root=/dev/md1 

title=Gentoo orig
root (hd1,0)
kernel /vmlinuz root=/dev/hda 
Code: /etc/fstab
headless ~ # cat /etc/fstab |grep -ve\# -ve^$
/dev/md0                /boot           ext2            noauto,noatime  1 2
/dev/md1                /               ext3            noatime         0 1
/dev/hda2               none            swap            sw              0 0
/dev/hdb2               none            swap            sw              0 0
/dev/cdroms/cdrom0      /mnt/cdrom      iso9660         noauto,ro       0 0
/usr/portage/distfiles  /tftproot/AthlonXP/usr/portage/distfiles none bind,noatime,rw 0 0
/usr/local/sci          /tftproot/AthlonXP/usr/local/sci none bind,noatime,rw 0 0
142.137.135.205:/DATA   /mnt/Elvis/DATA   nfs           noatime         0 0
pythagore:/export/home  /mnt/Pythagore    nfs           noatime         0 0
proc                    /proc           proc            defaults        0 0
shm                     /dev/shm        tmpfs           nodev,nosuid,noexec     0 0

Copying the data to the degraded disk

We then copy the existing data to the "degraded" mirror. This was done by booting on a LiveCD, guaranteeing that the data would not be modified while copying the system over to the degraded RAID1. First we must load the md modules on the LiveCD :

modprobe md
modprobe raid1

And then create the md device entries (we make more than you might need):

for I in 0 1 2 3; do mknod /dev/md$I b 9 $I; done

Unfortunately, it would seem that we need to "recreate" the RAID drives with mdadm so...

mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 missing /dev/hdb1
mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 missing /dev/hdb3

For each mirrored partition, mount both the original and RAID version, copy the data over to the RAID partition. For example (the /boot partition):

mkdir /mnt/raid && mount /dev/hda1 /mnt/gentoo/ && mount /dev/md0 /mnt/raid && cd /mnt/gentoo/ && cp -a * /mnt/raid/

Merging the disks

Once you are able to boot off the md device, it's time to scrap the original disk (hda) and merge it to the new md device. Since we started by creating an exact copy of the orifinal device, there is not much to do to the hda partition table other than changing the type to fd:

headless ~ #fdisk -l /dev/hda

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          31      248976   fd  Linux raid autodetect
/dev/hda2              32         275     1959930   82  Linux swap / Solaris
/dev/hda3             276        9729    75939255   fd  Linux raid autodetect 

We insert hda1 into the md0 device as follows:

mdadm /dev/md0 --add /dev/hda1

Now the same thing for the root partition:

mdadm /dev/md1 --add /dev/hda3

You can see the mirror's recreation process with cat /proc/mdstat

headless ~ # cat /proc/mdstat 
Personalities : [raid0] [raid1] 
md2 : active raid0 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      624992256 blocks 64k chunks
      
md1 : active raid1 hda3[2] hdb3[1]
      75939136 blocks [2/1] [_U]
      [>....................]  recovery =  0.2% (216576/75939136) finish=34.9min speed=36096K/sec
      
md0 : active raid1 hda1[0] hdb1[1]
      248896 blocks [2/2] [UU]
      
unused devices: <none> 

As you can see, md0 is now built with hda1[0] and hdb1[1], and md1 is built with hda3[2] and hdb3[1].

mkfs

Good measure for etx3 filesystems (extra superblocks and indexing of dirs):

 mkfs -j -O ^sparse_super,dir_index /dev/xxx

And for an FS with HUGE files (man mke2fs):

-T fs-type
Specify  how  the filesystem is going to be used, so that mke2fs can chose optimal filesystem parameters for that use.  The supported filesystem types are:

 news        one inode per 4kb block
 largefile   one inode per megabyte
 largefile4  one inode per 4 megabytes

HDD I/O Benchmarking

Optimize I/Os with hdparm

Example:

hdparm -d 1 -m 16 -a 8128 /dev/i2o/hda

Some benchmarking utilities

Used to benchtest HDD performances.

tiobench 
iozone    for client<->server performance.
bonnie++  to test server<->storage performance.

Some crazy ass results

Using tiobench

On the following setup:

Software RAID0 with 4 identical SATA drives reported to have the following performance according to hdparm:

headless ScratchPad # hdparm -Tt /dev/sd[abcd]

/dev/sda:
 Timing cached reads:   2552 MB in  2.00 seconds = 1276.00 MB/sec
 Timing buffered disk reads:  180 MB in  3.02 seconds =  59.51 MB/sec

/dev/sdb:
 Timing cached reads:   2548 MB in  2.00 seconds = 1274.52 MB/sec
 Timing buffered disk reads:  178 MB in  3.00 seconds =  59.33 MB/sec

/dev/sdc:
 Timing cached reads:   2548 MB in  2.00 seconds = 1273.86 MB/sec
 Timing buffered disk reads:  178 MB in  3.01 seconds =  59.17 MB/sec

/dev/sdd:
 Timing cached reads:   2564 MB in  2.00 seconds = 1281.16 MB/sec
 Timing buffered disk reads:  188 MB in  3.02 seconds =  62.30 MB/sec 

These 4 drives are mounted as a RAID0 and are formated using XFS from kernel 2.6.16-ck9. tiobench returned the following stats:

headless ScratchPad # tiobench.pl --block 4096 --block 8192 --block 65535
No size specified, using 2000 MB
Run #1: /usr/sbin/tiotest -t 8 -f 250 -r 500 -b 65535 -d . -TTT

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.16-ck9                    2000  4096    1   88.92 25.87%     0.042       23.89   0.00000  0.00000   344
2.6.16-ck9                    2000  4096    2  117.38 55.05%     0.054      138.95   0.00000  0.00000   213
2.6.16-ck9                    2000  4096    4  152.73 130.1%     0.074      116.60   0.00000  0.00000   117
2.6.16-ck9                    2000  4096    8  233.11 356.8%     0.075      197.46   0.00000  0.00000    65
2.6.16-ck9                    2000  8192    1   90.93 20.77%     0.084       22.07   0.00000  0.00000   438
2.6.16-ck9                    2000  8192    2  125.95 43.38%     0.098      172.37   0.00000  0.00000   290
2.6.16-ck9                    2000  8192    4  174.31 88.02%     0.107      158.41   0.00000  0.00000   198
2.6.16-ck9                    2000  8192    8  227.55 205.0%     0.161      253.12   0.00000  0.00000   111
2.6.16-ck9                    2000  65535   1  100.87 21.53%     0.617       16.42   0.00000  0.00000   468
2.6.16-ck9                    2000  65535   2  166.68 40.08%     0.549      202.89   0.00000  0.00000   416
2.6.16-ck9                    2000  65535   4  218.20 82.91%     0.676      175.64   0.00000  0.00000   263
2.6.16-ck9                    2000  65535   8  225.41 141.5%     1.054      363.01   0.00000  0.00000   159

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.16-ck9                    2000  4096    1   13.57 6.947%     0.286       17.42   0.00000  0.00000   195
2.6.16-ck9                    2000  4096    2   21.00 5.375%     0.353       29.84   0.00000  0.00000   391
2.6.16-ck9                    2000  4096    4   26.37 54.00%     0.517       32.37   0.00000  0.00000    49
2.6.16-ck9                    2000  4096    8   32.07 65.67%     0.853       50.78   0.00000  0.00000    49
2.6.16-ck9                    2000  8192    1   30.49 2.926%     0.254       16.18   0.00000  0.00000  1042
2.6.16-ck9                    2000  8192    2   39.35 12.59%     0.387       23.92   0.00000  0.00000   313
2.6.16-ck9                    2000  8192    4   58.93 7.543%     0.413       30.21   0.00000  0.00000   781
2.6.16-ck9                    2000  8192    8   60.54 58.11%     0.869       53.98   0.00000  0.00000   104
2.6.16-ck9                    2000  65535   1  120.57 13.50%     0.516       15.79   0.00000  0.00000   893
2.6.16-ck9                    2000  65535   2  144.87 30.71%     0.795       21.73   0.00000  0.00000   472
2.6.16-ck9                    2000  65535   4  225.26 91.00%     0.985       32.01   0.00000  0.00000   248
2.6.16-ck9                    2000  65535   8  154.88 118.9%     2.314       68.53   0.00000  0.00000   130

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.16-ck9                    2000  4096    1   58.71 18.43%     0.035     2207.10   0.00020  0.00000   318
2.6.16-ck9                    2000  4096    2   59.06 41.99%     0.086     8878.79   0.00059  0.00000   141
2.6.16-ck9                    2000  4096    4   59.17 85.05%     0.113     8520.31   0.00098  0.00000    70
2.6.16-ck9                    2000  4096    8   59.28 170.8%     0.212     7899.14   0.00371  0.00000    35
2.6.16-ck9                    2000  8192    1   58.32 18.54%     0.074      118.43   0.00000  0.00000   314
2.6.16-ck9                    2000  8192    2   59.11 38.53%     0.136     4603.73   0.00117  0.00000   153
2.6.16-ck9                    2000  8192    4   59.00 74.27%     0.272     9009.24   0.00234  0.00000    79
2.6.16-ck9                    2000  8192    8   59.36 156.7%     0.460    10244.58   0.00742  0.00039    38
2.6.16-ck9                    2000  65535   1   58.73 14.09%     0.594       84.02   0.00000  0.00000   417
2.6.16-ck9                    2000  65535   2   59.44 27.96%     1.063     1579.43   0.00000  0.00000   213
2.6.16-ck9                    2000  65535   4   59.47 59.26%     1.706     9836.32   0.01562  0.00000   100
2.6.16-ck9                    2000  65535   8   59.42 115.3%     3.132    13360.27   0.04688  0.00313    52

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.16-ck9                    2000  4096    1    7.96 2.547%     0.007        0.07   0.00000  0.00000   312
2.6.16-ck9                    2000  4096    2    7.78 9.456%     0.012        6.50   0.00000  0.00000    82
2.6.16-ck9                    2000  4096    4    8.28 14.83%     0.018       10.16   0.00000  0.00000    56
2.6.16-ck9                    2000  4096    8    7.83 13.03%     0.021       12.44   0.00000  0.00000    60
2.6.16-ck9                    2000  8192    1   12.60 12.09%     0.017        0.11   0.00000  0.00000   104
2.6.16-ck9                    2000  8192    2   13.42 13.74%     0.017        0.11   0.00000  0.00000    98
2.6.16-ck9                    2000  8192    4   13.81 37.12%     0.018        0.23   0.00000  0.00000    37
2.6.16-ck9                    2000  8192    8   13.99 76.56%     0.039       40.05   0.00000  0.00000    18
2.6.16-ck9                    2000  65535   1   49.07 14.13%     0.097        3.25   0.00000  0.00000   347
2.6.16-ck9                    2000  65535   2   51.08 29.22%     0.106        9.02   0.00000  0.00000   175
2.6.16-ck9                    2000  65535   4   51.22 43.84%     0.107        6.05   0.00000  0.00000   117
2.6.16-ck9                    2000  65535   8   49.55 85.81%     0.255      151.17   0.00000  0.00000    58
Personal tools