If you are looking for the epic motorcycle journey blog that I've written, please see the Miles By Motorcycle site I put together. 
  • Adventures in replacing a failed SCSI drive in an old Linux/CentOS 5 RAID1 Array using mdadm and grub.
    01/01/2012 9:37PM

    For the uninitiated

    Did you stumble upon this article unawares? Part of what I do is manage internet connected servers. All the web sites and internet services that I manage are running on a physical machine I have.

    In many ways this machine is similar to the desktop, or even laptop, machines you are familiar with. The biggest difference is that this machine is designed to be left running for years at a time, which it has. I have been known to leave it running for close to two years between reboots.

    To make my life easy, I run a version of the Linux server operating system called CentOS 5. It's based on a server operating system offering from RedHat called RedHat Enterprise Linux.

    The two things that are most likely to fail in a computer are the power supplies and the hard drives. If the power supply fails, the machine won't power up. Hopefully as it failed it didn't fry the electronics of the computer, which unfortunately happened to a friend of mine. But if it does fail and happens not to take the computer with it, you can simply get another power supply and you're back in business. In my case, my server machine has two power supplies in it which auto-failover because, like I said above, I like to leave things running for years.

    If a hard drive fails you've got bigger problems. If you lose a hard drive on your PC or laptop you lose your photos, documents, music, contacts. It sucks.

    If I lose a drive on my server, I lose my databases including customer records. All my websites stop working which means no one gets to read my Miles By Motorcycle articles, participate in YML.COM forum discussions, or share apps through AppUpdate. Even all the sales and support of our stock market tracking and trading application, Personal Stock Monitor, cease immediately.

    Then there are the things I've set up as favors including the redirector for Claudia, the singer for a band called AngelRow, not to mention the email forwarding and other services I provide for various friends and ex-girlfriends.

    Suffice it to say if I lose a hard drive on my server it would Ruin My Day which would result in much despair and complaint filled status updates as I scramble to try to rebuild everything while listening to screams of agony from customers and long lost ex-girlfriends.

    Enter this thing called RAID1 along with a thing called Hot Swappable SCSI racks. SCSI is just a different type of disk drive. Your machine probably uses cheap SATA drives. SCSI drives are designed for server applications and basically allow you a greater amount of control over the drive When Bad Things Happen. In additon, because SCSI drives and controllers are "smart" the main processor of the computer doesn't have to do as much work. Hot Swappable means that, theoretically, you can pull out a bad drive and plug in a new drive without shutting down the machine or even rebooting it. Awesome, if it works.

    Now, in my experience. SCSI drives last a hell of a lot longer than your typical SATA drive. But even so after many years of continuous operating, say for instance, 6, Bad Things can still happen. 

    Hence the need for RAID1. Basically RAID1 is a way for you to set up two drives as mirrors of each other. As the system runs and stuff is saved to the primary drive the RAID1 system automatically mirrors those changes to the secondary drive. It also continuously monitors the drives and, if they fail, it marks them "offline" and proceeds without interruption to use the other drive. And, assuming you set it up correctly, it'll even send you an email to let you know this happend.

    And it works marvelously. So marvelously in fact that its run for nearly 7 years.

    But things have been busy and stressful and frankly a detail slipped. Over the years with changes and edits, somewhere along the line the email setting got changed and when one of the drives failed, in this case the secondary, the RAID1 system failed to send me an email. Not noticing anything in the logs I didnt' notice.

    Then the primary drive started to fail. And to my shock and horror I then noticed the secondary was already dead.

    It had been many years since I looked at any of this stuff but I make it a point to always take good notes. As evidenced by the inhuman reliability I've achieved by the top end SCSI drives I buy, in 10 years of running internet connected servers set up with RAID1, this is the first time a drive has failed on me, so this is the first time I had to go through the process of:

    1. remove failed drive from my wicked cool hot swap SCSI drive rack. (Literally just press the button, pull lever, and drive slides right out)
    2. install new drive (reverse of above and it's supposed to work while everything is live. 0 down time!)
    3. tell the RAID1 system that there's a new drive and that it should bring it back online as a new mirror so that there will once again be happiness in the land.

    I was supposed to be able to just pull the drive. Slap the new one in. Run a few commands and have it Just Work with 0 downtime. 

    Yea, not so much ...

    So Now For the Initiated

    So I had a drive on my old dual Xeon machine running CentOS 5 fail. It's part of a SCSI RAID1 array in a SCSI hotswap bay. 

    It's been literally years since I've played with this thing. It's been rock solid reliable for at least 6 years. This is one of the downsides of having infrastructure that's too reliable. By the time something fails, you've long since forgotten the details. 

    Luckily I take very careful notes about everything I do both in hardware setups and in software. However, since this is the first time I've had to replace a drive in a live RAID array it took a little research.

    In my case I have an array consisting of two partitions:

    /dev/md1 which is RAID1 and mirrors /dev/sda1 and /dev/sdb1

    /dev/md2 which is RAID1 and mirrors /dev/sda2 and /dev/sdb2

    (Yea, when I set them up initially I didn't do them in the order one would expect.)

    Sp to get the status and type of my RAID array:

    cat /proc/mdstat

    which yielded:


    [root@xeon ~]# cat /proc/mdstat
    Personalities : [raid1] 
    md1 : active raid1 sda1[0]
          104320 blocks [2/1] [U_]
    md0 : active raid1 sda2[0]
          71577536 blocks [2/1] [U_]

    to get details about a particular raid device and to determine which drive
    has failed:

    mdadm -D /dev/md0

    which yielded:


    [root@xeon ~]# mdadm -D /dev/md0
            Version : 00.90.03
      Creation Time : Mon Nov 26 17:52:35 2007
         Raid Level : raid1
         Array Size : 71577536 (68.26 GiB 73.30 GB)
      Used Dev Size : 71577536 (68.26 GiB 73.30 GB)
       Raid Devices : 2
      Total Devices : 1
    Preferred Minor : 0
        Persistence : Superblock is persistent
         Update Time : Sun Jan  1 21:07:34 2012
              State : clean, degraded
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 0
      Spare Devices : 0
                UUID : 8eac3053:dde1e60c:803f9bee:193a1a56
             Events : 0.21365348
         Number   Major   Minor   RaidDevice State
           0       8        2        0      active sync   /dev/sda2
           1       0        0        1      removed

    In my case, pulling out the failed drive was no problem. (To figure out which drive in the physical array was the bad drive, I did a quick 'du -s /usr/local' and looked to see which drive lit up on the rack. I removed the other drive.).

    However, upon inserting the new drive I got SCSI BUS RESET errors.

    I read online that you can get the SCSI bus to rescan by doing a:

    cd /sys/class/scsi_host/hostX echo "- - - " > scan

    however this did not work since the entire array became unreadable. I'm sure I'm missing something in this case.

    (Apparently, there is a process you must go through to prepare the SCSI bus to accept the new drive. Since I didn't want to experiment with a live machine and I don't have another RAID array to play with, I decided just to power down the machine and reboot.)

    A shutdown and reboot resolved my issue. After a longer than usual delay the machine rebooted on the one good drive in the RAID1 array.

    When inserting a new drive, for a RAID1 array, the partition tables
    of the two drives needs to be identical, so we copy the partition table from our live drive to our new blank drive, since we are running a mirrored RAID1 setup. For other versions of RAID the process would be differerent.

    ***** IMPORTANT:
    replace /dev/sd<good drive> with the LIVE drive, /dev/sd<blank drive> with the NEW BLANK

    If this is not done correctly the LIVE drive will be TOAST.
    ***** IMPORTANT

    sfdisk -d /dev/sd<good drive> | sfdisk /dev/sd<blank drive>

    Screwing up the command above will destroy your data. Be careful.

    Then add the partitions back into the array (making sure to verify that
    the correct partitions are being added to the right array)


    mdadm --re-add /dev/md0 /dev/sdb2
    mdadm --re-add /dev/md1 /dev/sdb1

    Then do a

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    to see that the arrays are being rebuilt.

    In my case:

    [root@xeon ~]# mdadm -D /dev/md0 /dev/md0:         V
    ersion : 00.90.03   
    Creation Time : Mon Nov 26 17:52:35 2007      
    Raid Level : raid1      
    Array Size : 71577536 (68.26 GiB 73.30 GB)   
    Used Dev Size : 71577536 (68.26 GiB 73.30 GB)    
    Raid Devices : 2   
    Total Devices : 2 
    Preferred Minor : 0     
    Persistence : Superblock is persistent      
    Update Time : Sun Jan  1 21:33:38 2012           
    State : clean, degraded, recovering  
    Active Devices : 1 Working Devices : 2  Failed Devices : 0   Spare Devices : 1   
    Rebuild Status : 30% complete             
    UUID : 8eac3053:dde1e60c:803f9bee:193a1a56          
    Events : 0.21365936      
    Number   Major   Minor   RaidDevice State        
    0       8        2        0      active sync   /dev/sda2        
    2       8       18        1      spare rebuilding   /dev/sdb2  

    Note the "spare rebuilding".

    Now at this point I ran into a serious problem. The live drive I was mirroring from had a few bad sectors that fsck was unable to deal with. These were unrecoverable media errors.

    What I expected to happen was that the RAID1 reconstruction would jump over these bad tracks but that is not the way it works. As soon as it encountered the bad track the RAID reconstruction would fail miserably and restart. Bummer.

    I had considered powering the system down and using a tool like ddrescue to copy over the partitions by hand. Fortunately, my buddy Duncan pointed out that I would instead just use the SCSI BIOS utilities present on the Adaptec SCSI controller to "verify media" and have it map the bad sectors. This is the option I chose since it's less error prone. Unfortunately it meant another 30 minutes of downtime while the process ran.

    However the SCSI verify utility was able to map out the bad tracks and upon rebooting the RAID1 array reconstructed itself correctly.

    Now the next problem I ran into was when I tried to remove the drive with the bad sectors.

    The first step is to fail the drive using the mdadm utility.

    mdadm --manage /dev/md0 --fail /dev/sda2

    mdadm --manage /dev/md1 --fail /dev/sda1

    followed by a "remove":

    mdadm --manage /dev/md0 --remove /dev/sda2

    mdadm --manage /dev/md1 --remove /dev/sda1

    I powered the server down again, pulled the drive with the bad tracks, slapped in a new blank drive, the machine would not boot. Bummer. I had forgotten to rebuilt the boot track on the new drive.

    So I put the drive back in, tried grub-install /dev/sdb to install the grub boot loader on /dev/sdb.

    No joy. The machine just hung. Much Panic.

    So I put the old drive back in, fired it up, ran grub-install /dev/sdb on the drive that had just been synced. Went through the process again.

    Still no joy. Now Much More Panic.

    Then I swapped the synced new drive into the slot in the SCSI tray where the original /dev/sda drive had been and put the blank new drive in the /dev/sdb slot.

    Now it booted. Apparently GRUB is set up to look for the boot record on the /dev/sda device, so if that drive fails completely the machine won't boot. I have grub installed on both drives now so if that situation does arise I can just swap the good drive into that slot and it should work. Not elegant. Clearly there is something about booting on RAID devices that i do not understand.

    But at this moment I now am running my server on two replacement drives.