If you are looking for the epic motorcycle journey blog that I've written, please see the Miles By Motorcycle site I put together. 
  • Subscribe to this RSS Feed
  • Converting a Physical Server to a Virtual Server using VirtualBox
    08/26/2014 10:20AM

    We had to move the servers back upstairs. Given that vetsclub is very long in the tooth we were afraid it wouldn't come back up. So a a fallback, in case it failed to boot, we decided to create a virtual copy of physical vetsclub.

    The recipe is pretty straight forward. Because the machine is so old, Duncan pulled the drive and inserted it into an old external USB case. We then created an image of the entire drive:

    dd if=/dev/sdc of=vetsclub.dd.img 

    NOTE: Obviously the source input file will likely be different in your situation. 

    Virtual Box makes it extremely easy to convert a raw image file into a bootable virtual machine. Assuming a current (as of this writing 4.3.10) VirtualBox the command is:

    VBoxManage convertdd vetsclub.dd.img vetsclub.vdi --format VDI

    Now it's just a matter of booting it. Unlike VMWare, adding the drive is a separate step.

    Create a New Virtual machine and then on the screen titled Hard drive select "Use an existing virtual hard drive file".

    Click the browse button to the right of the option and select the VDI file you created in the step above. 

  • A Raspberry Pi Media Server
    06/27/2014 5:19PM
    So for the longest time I've been showing videos and photos on the big TV by copying everything over to a spare USB drive and then plugging it into the PS/3. It works, but it's klunky.

    What I really wanted to do was set up a generic computer that I could plug directly into the TV and use it to show videos (mostly GoPro), photos and play music. 

    I have a Mac Mini with an HDMI output but in the year it was made they didn't support sound. 

    Then I realized that the Raspberry Pi hobbyist single board computers ($35) include an HDMI port along with two USB ports, an SD slot amongst others. In addition, there's a distribution of the Linux operating system available for the Pi that's a pre-configured ready to run Media Server called "OpenElec". 

    The Pi is powered by a simple USB power supply. 

    You can download a project called NOOBS which has a bunch of pre-configured OS's on it. You simply select from the menu and it installs and boots the selected OS.

    I initially tried RaspBMC but it kept locking up on me. After exploring power supply issues, I decided to try OpenElec which for the last hour has been very reliable.

    It fires up into a very simply interface for Photos, Videos, Music and Programs. It lets you browse the network for photos if you have a windows (SMB) server.

    All in all a very nice solution.  

  • Debugging a Website with Telnet
    06/08/2013 2:15AM
    Ian

    Telnet and modern-day websites aren't often considered together, but that's how I recently solved a problem with a server I now work with.  To make matters a little extra complicated, the server runs the Solaris 10 operating system and I do not have 'root' access to it.

     Previously, this server was just acting as a reverse proxy and handling traffic to and from Glassfish Java application servers, and there was no problem with that.  However, when I installed AWstats to provide web stats reporting functionality, I ran into a problem.  It just didn't work.  Attempts to load the AWstats pages in a browser resulted in nothing, as if the browser had not been asked to do anything.  No browser console error messages, nothing.  Firefox, IE, Chrome, even 'wget', all the same.  Finally, using the text-based browser 'elinks' from my home machine I got a "bad headers" message.  Aha!  I tried the "Live HTTP Headers" plugin for Firefox, which showed me... nothing.

     I could see in the logs on the web server that it was generating a 200 ("OK") response and could even see the size of the output in the logs, but got nothing at the browser.  Most peculiar.  Eventually I tried telnet.

    So you use telnet to connect to a server on port 80 and then manually make requests, just the way a browser does, and it will happily show you the otherwise invisible headers that the browser receives, along with all of the "normal" data. After trying this a few times I realized that for any request where there was a "Content-Length" header, the output was fine, but for any request where the content did not have the length specified in the header, but rather used "chunked" transfer encoding, it would fail.

    The Apache web server can only issue a Content-Length header when it knows how long the content is that it is serving up.  If the content is generated by a script (dynamically), then it can't know exactly how much data will be output until after the headers have been sent, so it breaks the output into "chunks", and before each chunk it tells the browser how big the next chunk is, so it will know if it has received the whole thing.  The chunk size is reported using a hexadecimal value and is readily visible in the telnet output. MY server was not printing the chunk size as hexadecimal; it came out as "%lx" and that is what was causing the "bad headers".

    I've got bigger fish to fry, so I haven't researched this in the greatest detail.  Suffice it to say "Apache was not compiled properly on this machine."  I recompiled Apache and the problem was gone.

     It's been a while since I worked on servers where we compiled software like Apache instead of installing pre-compiled packages, but it's interesting in that it takes me back to the old days of internet server management.

    For the curious, here's how to make a query to a web server using telnet. I've bolded the user input, the transfer encoding line, and the hexadecimal chunk size.  Note that you must hit Enter one more time after the "Connection:" line before the web server will respond.

     $ telnet yml.com 80
    Trying 108.18.135.196...
    Connected to yml.com.
    Escape character is '^]'.
    GET /homepage.html HTTP/1.1
    Host: yml.com
    Connection: close


    HTTP/1.1 200 OK
    Date: Sat, 08 Jun 2013 06:00:18 GMT
    Server: Apache/2.2.3 (CentOS)
    X-Powered-By: PHP/5.3.5
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
    Set-Cookie: f261cfbdfb9f612=gs758ac6uentfva7233o5p9lc3; path=/; domain=yml.com
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Pragma: no-cache
    Vary: Accept-Encoding,User-Agent
    Connection: close
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=iso-8859-1

    2203
    html output follows...

  • Testing an SSL connection from the command line
    03/11/2013 1:06PM
    Ian

    I had a need recently to test an HTTPS connection from one remote location to another.  Not having a graphical browser available with which to test, I had to use a text-based tool. 'lynx' is a text-based browser that was available on the client end, but it does not support SSL. 'elinks' is lynx-like program, which DOES support SSL, but was not installed on the client side.

    Another option is to use 'openssl'.  Although it's not fancy, it will suffice to test an SSL connection. For example,

    # openssl s_client -connect google.com:443

     which should produce output like:

    CONNECTED(00000003)
    depth=1 C = US, O = Google Inc, CN = Google Internet Authority
    verify error:num=20:unable to get local issuer certificate
    verify return:0
    ---
    Certificate chain
     0 s:/C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.google.com
       i:/C=US/O=Google Inc/CN=Google Internet Authority
     1 s:/C=US/O=Google Inc/CN=Google Internet Authority
       i:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority
    ---
    Server certificate
    -----BEGIN CERTIFICATE-----

    Once connected, it's possible to make simple requests by entering a GET command, such as:

    GET /index.html

    Enter Q to quit.


  • Adventures in replacing a failed SCSI drive in an old Linux/CentOS 5 RAID1 Array using mdadm and grub.
    01/01/2012 9:37PM

    For the uninitiated

    Did you stumble upon this article unawares? Part of what I do is manage internet connected servers. All the web sites and internet services that I manage are running on a physical machine I have.

    In many ways this machine is similar to the desktop, or even laptop, machines you are familiar with. The biggest difference is that this machine is designed to be left running for years at a time, which it has. I have been known to leave it running for close to two years between reboots.

    To make my life easy, I run a version of the Linux server operating system called CentOS 5. It's based on a server operating system offering from RedHat called RedHat Enterprise Linux.

    The two things that are most likely to fail in a computer are the power supplies and the hard drives. If the power supply fails, the machine won't power up. Hopefully as it failed it didn't fry the electronics of the computer, which unfortunately happened to a friend of mine. But if it does fail and happens not to take the computer with it, you can simply get another power supply and you're back in business. In my case, my server machine has two power supplies in it which auto-failover because, like I said above, I like to leave things running for years.

    If a hard drive fails you've got bigger problems. If you lose a hard drive on your PC or laptop you lose your photos, documents, music, contacts. It sucks.

    If I lose a drive on my server, I lose my databases including customer records. All my websites stop working which means no one gets to read my Miles By Motorcycle articles, participate in YML.COM forum discussions, or share apps through AppUpdate. Even all the sales and support of our stock market tracking and trading application, Personal Stock Monitor, cease immediately.

    Then there are the things I've set up as favors including the redirector for Claudia, the singer for a band called AngelRow, not to mention the email forwarding and other services I provide for various friends and ex-girlfriends.

    Suffice it to say if I lose a hard drive on my server it would Ruin My Day which would result in much despair and complaint filled status updates as I scramble to try to rebuild everything while listening to screams of agony from customers and long lost ex-girlfriends.

    Enter this thing called RAID1 along with a thing called Hot Swappable SCSI racks. SCSI is just a different type of disk drive. Your machine probably uses cheap SATA drives. SCSI drives are designed for server applications and basically allow you a greater amount of control over the drive When Bad Things Happen. In additon, because SCSI drives and controllers are "smart" the main processor of the computer doesn't have to do as much work. Hot Swappable means that, theoretically, you can pull out a bad drive and plug in a new drive without shutting down the machine or even rebooting it. Awesome, if it works.

    Now, in my experience. SCSI drives last a hell of a lot longer than your typical SATA drive. But even so after many years of continuous operating, say for instance, 6, Bad Things can still happen. 

    Hence the need for RAID1. Basically RAID1 is a way for you to set up two drives as mirrors of each other. As the system runs and stuff is saved to the primary drive the RAID1 system automatically mirrors those changes to the secondary drive. It also continuously monitors the drives and, if they fail, it marks them "offline" and proceeds without interruption to use the other drive. And, assuming you set it up correctly, it'll even send you an email to let you know this happend.

    And it works marvelously. So marvelously in fact that its run for nearly 7 years.

    But things have been busy and stressful and frankly a detail slipped. Over the years with changes and edits, somewhere along the line the email setting got changed and when one of the drives failed, in this case the secondary, the RAID1 system failed to send me an email. Not noticing anything in the logs I didnt' notice.

    Then the primary drive started to fail. And to my shock and horror I then noticed the secondary was already dead.

    It had been many years since I looked at any of this stuff but I make it a point to always take good notes. As evidenced by the inhuman reliability I've achieved by the top end SCSI drives I buy, in 10 years of running internet connected servers set up with RAID1, this is the first time a drive has failed on me, so this is the first time I had to go through the process of:

    1. remove failed drive from my wicked cool hot swap SCSI drive rack. (Literally just press the button, pull lever, and drive slides right out)
    2. install new drive (reverse of above and it's supposed to work while everything is live. 0 down time!)
    3. tell the RAID1 system that there's a new drive and that it should bring it back online as a new mirror so that there will once again be happiness in the land.

    I was supposed to be able to just pull the drive. Slap the new one in. Run a few commands and have it Just Work with 0 downtime. 

    Yea, not so much ...

    So Now For the Initiated

    So I had a drive on my old dual Xeon machine running CentOS 5 fail. It's part of a SCSI RAID1 array in a SCSI hotswap bay. 

    It's been literally years since I've played with this thing. It's been rock solid reliable for at least 6 years. This is one of the downsides of having infrastructure that's too reliable. By the time something fails, you've long since forgotten the details. 

    Luckily I take very careful notes about everything I do both in hardware setups and in software. However, since this is the first time I've had to replace a drive in a live RAID array it took a little research.

    In my case I have an array consisting of two partitions:

    /dev/md1 which is RAID1 and mirrors /dev/sda1 and /dev/sdb1

    /dev/md2 which is RAID1 and mirrors /dev/sda2 and /dev/sdb2

    (Yea, when I set them up initially I didn't do them in the order one would expect.)

    Sp to get the status and type of my RAID array:

    cat /proc/mdstat

    which yielded:

     

    [root@xeon ~]# cat /proc/mdstat
    Personalities : [raid1] 
    md1 : active raid1 sda1[0]
          104320 blocks [2/1] [U_]
          
    md0 : active raid1 sda2[0]
          71577536 blocks [2/1] [U_]



    to get details about a particular raid device and to determine which drive
    has failed:

    mdadm -D /dev/md0

    which yielded:

     

    [root@xeon ~]# mdadm -D /dev/md0
    /dev/md0:
            Version : 00.90.03
      Creation Time : Mon Nov 26 17:52:35 2007
         Raid Level : raid1
         Array Size : 71577536 (68.26 GiB 73.30 GB)
      Used Dev Size : 71577536 (68.26 GiB 73.30 GB)
       Raid Devices : 2
      Total Devices : 1
    Preferred Minor : 0
        Persistence : Superblock is persistent
         Update Time : Sun Jan  1 21:07:34 2012
              State : clean, degraded
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 0
      Spare Devices : 0
                UUID : 8eac3053:dde1e60c:803f9bee:193a1a56
             Events : 0.21365348
         Number   Major   Minor   RaidDevice State
           0       8        2        0      active sync   /dev/sda2
           1       0        0        1      removed

    In my case, pulling out the failed drive was no problem. (To figure out which drive in the physical array was the bad drive, I did a quick 'du -s /usr/local' and looked to see which drive lit up on the rack. I removed the other drive.).

    However, upon inserting the new drive I got SCSI BUS RESET errors.

    I read online that you can get the SCSI bus to rescan by doing a:

    cd /sys/class/scsi_host/hostX echo "- - - " > scan

    however this did not work since the entire array became unreadable. I'm sure I'm missing something in this case.

    (Apparently, there is a process you must go through to prepare the SCSI bus to accept the new drive. Since I didn't want to experiment with a live machine and I don't have another RAID array to play with, I decided just to power down the machine and reboot.)

    A shutdown and reboot resolved my issue. After a longer than usual delay the machine rebooted on the one good drive in the RAID1 array.

    When inserting a new drive, for a RAID1 array, the partition tables
    of the two drives needs to be identical, so we copy the partition table from our live drive to our new blank drive, since we are running a mirrored RAID1 setup. For other versions of RAID the process would be differerent.

    ***** IMPORTANT:
    replace /dev/sd<good drive> with the LIVE drive, /dev/sd<blank drive> with the NEW BLANK
    DRIVE.

    If this is not done correctly the LIVE drive will be TOAST.
    ***** IMPORTANT

    sfdisk -d /dev/sd<good drive> | sfdisk /dev/sd<blank drive>

    Screwing up the command above will destroy your data. Be careful.

    Then add the partitions back into the array (making sure to verify that
    the correct partitions are being added to the right array)

    in my case: (DO NOT JUST COPY THE COMMANDS BELOW! YOU MUST ADJUST THEM FOR YOUR SETUP)

    mdadm --re-add /dev/md0 /dev/sdb2
    mdadm --re-add /dev/md1 /dev/sdb1

    Then do a

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    to see that the arrays are being rebuilt.

    In my case:

    [root@xeon ~]# mdadm -D /dev/md0 /dev/md0:         V
    ersion : 00.90.03   
    Creation Time : Mon Nov 26 17:52:35 2007      
    Raid Level : raid1      
    Array Size : 71577536 (68.26 GiB 73.30 GB)   
    Used Dev Size : 71577536 (68.26 GiB 73.30 GB)    
    Raid Devices : 2   
    Total Devices : 2 
    Preferred Minor : 0     
    Persistence : Superblock is persistent      
    Update Time : Sun Jan  1 21:33:38 2012           
    State : clean, degraded, recovering  
    Active Devices : 1 Working Devices : 2  Failed Devices : 0   Spare Devices : 1   
    Rebuild Status : 30% complete             
    UUID : 8eac3053:dde1e60c:803f9bee:193a1a56          
    Events : 0.21365936      
    Number   Major   Minor   RaidDevice State        
    0       8        2        0      active sync   /dev/sda2        
    2       8       18        1      spare rebuilding   /dev/sdb2  

    Note the "spare rebuilding".

    Now at this point I ran into a serious problem. The live drive I was mirroring from had a few bad sectors that fsck was unable to deal with. These were unrecoverable media errors.

    What I expected to happen was that the RAID1 reconstruction would jump over these bad tracks but that is not the way it works. As soon as it encountered the bad track the RAID reconstruction would fail miserably and restart. Bummer.

    I had considered powering the system down and using a tool like ddrescue to copy over the partitions by hand. Fortunately, my buddy Duncan pointed out that I would instead just use the SCSI BIOS utilities present on the Adaptec SCSI controller to "verify media" and have it map the bad sectors. This is the option I chose since it's less error prone. Unfortunately it meant another 30 minutes of downtime while the process ran.

    However the SCSI verify utility was able to map out the bad tracks and upon rebooting the RAID1 array reconstructed itself correctly.

    Now the next problem I ran into was when I tried to remove the drive with the bad sectors.

    The first step is to fail the drive using the mdadm utility.

    mdadm --manage /dev/md0 --fail /dev/sda2

    mdadm --manage /dev/md1 --fail /dev/sda1

    followed by a "remove":

    mdadm --manage /dev/md0 --remove /dev/sda2

    mdadm --manage /dev/md1 --remove /dev/sda1

    I powered the server down again, pulled the drive with the bad tracks, slapped in a new blank drive, the machine would not boot. Bummer. I had forgotten to rebuilt the boot track on the new drive.

    So I put the drive back in, tried grub-install /dev/sdb to install the grub boot loader on /dev/sdb.

    No joy. The machine just hung. Much Panic.

    So I put the old drive back in, fired it up, ran grub-install /dev/sdb on the drive that had just been synced. Went through the process again.

    Still no joy. Now Much More Panic.

    Then I swapped the synced new drive into the slot in the SCSI tray where the original /dev/sda drive had been and put the blank new drive in the /dev/sdb slot.

    Now it booted. Apparently GRUB is set up to look for the boot record on the /dev/sda device, so if that drive fails completely the machine won't boot. I have grub installed on both drives now so if that situation does arise I can just swap the good drive into that slot and it should work. Not elegant. Clearly there is something about booting on RAID devices that i do not understand.

    But at this moment I now am running my server on two replacement drives.