Please register or login. There are 0 registered and 1231 anonymous users currently online. Current bandwidth usage: 326.30 kbit/s December 17 - 08:57am EST 
Hardware Analysis
      
Forums Product Prices
  Contents 
 
 

  Latest Topics 
 

More >>
 

    
 
 

  Server downtime, SCSI controller woes 
  May 27, 2004, 07:30am EDT 
 
By: Sander Sassen

Our current server, a dual Xeon with a SCSI RAID10 array of four 15.000-rpm SCSI disks, has been up for almost a year. Unfortunately we started to see random lockups a few weeks ago, killing our uptime as we had to reboot to keep it running. You might even have noticed that we were offline for a few minutes here and there. Obviously we examined the logs only to find that the problem couldn’t be easily diagnosed, it could’ve been software, but a hardware defect was just as likely. I drove up to the ISP last Friday to examine the server from up close and couldn’t really find what was wrong with it until I powered it off and back on again. Suddenly none of the four SCSI disks nor the Adaptec SCSI RAID controller were showing up during boot up.

Server interior

Fig 1. Our web server, featuring dual Intel Xeons, 2GB of memory and a RAID10 array.

At these moments you just look at the screen and think "they were working just a minute ago, gimme a break here", but as always Murphy’s Law is unforgiving and neither the controller nor the disks would budge. As the server was built from new parts and painstakingly put together ensuring no part was placed or handled without an anti-static wristband and had been running for just over a year we were probably looking at a manufacturer defect. Getting a replacement for the SCSI controller wasn’t going to be easy though as we don’t keep spares, nor would finding a new one be an easy task on a Friday. So what do you do? You try to make it work again. I took it out of the slot, cleaned the PCI connector with a cotton cloth and some alcohol, and did the same to the PCI riser card it uses to connect to the motherboard.

After putting it all back together again I powered the server up, thumbs crossed and thinking "c’mon baby, you can do it" and fortunately saw the SCSI controller boot its kernel and find the disks and proceed to boot the Linux OS. Naturally you need a backup plan in case something like this happens and we could have switched to another server on another location, but that would mean many hours of downtime while DNS propagated around the world. So to prepare for worse we’ve now built a backup server and will be hooking that up to a fail-over network switch so we can switch between the two servers from remote without the need to change the DNS. We’ll have more details on the construction of that new server and the switching between them in an upcoming article which will be posted after our Computex coverage.

Sander Sassen.

 

  Comments 
 
 Subject 
 Author 
 Replies 
 Last Post 

 

  Voice Your Opinion 
 
Start New Discussion Topic
 

    
 
 

  Related Articles 
 
 

  Newsletter 
 
A weekly newsletter featuring an editorial and a roundup of the latest articles, news and other interesting topics.

Please enter your email address below and click Subscribe.