Archive for the ‘Hardware’ Category

The BrainPool Discusses Extended Warranties for Aging Servers

April 10th, 2010 by Paul Sterley | No Comments | Filed in Backup and Restore, Hardware, Virtualization

Recently, a discussion took place on the BrainPool distribution group that contained good comments and perspectives about extending the warranty on an aging server. I felt that there was significant value in this conversation, so I have pasted it below in its entirety. If you’d rather just have the salient points, I have enumerated them here:

 

Pros:

·         Fast replacement of hardware when a clear and definable failure occurs.

·         Availability of replacement hardware that may not easily be found elsewhere.

·         Peace of mind for business owners that in the event of a failure, a mechanism exists for fast recovery.

 

Cons:

·         Expense that might be better spent elsewhere, for example the purchase of a new server.

·         False sense of security which could lead to downtime when the aging server fails.

 

Mitigating Factors:

·         In a virtual environment with a Storage Area Network, it is fast and easy to boot a VM on another host when one fails.

·         If there is identical hardware available, a parts replacement warranty may not be needed.

 

Extenuating Circumstances:

·         In some business environments, redundancy and high availability are absolute requirements. In these cases, the above considerations probably do not apply, as there is an overriding business requirement for warranty/service contracts and failover hardware.

 

The discussion commences here:

 

Lynn says:

Occasionally, I really question whether it is prudent to recommend extending warranties for aging hardware.  The cost is pretty extreme (a recent Dell quote tells me this), to the point you could buy new hardware for the cost of extending the warranty for a couple of older servers.  I understand the concept of covering your own a$$, but with VM’s, disk based backups, relatively cheap SATA storage, etc – you can get a down system back online in pretty quick order, and work on the bad hardware in the background, without sacrificing much in the way of performance.    Does anyone have any thoughts on this?  I mean, it’s one thing if it was an ESXi box and you’ve got multiple guest systems running on there, but a file server, even an email server (aka – pretty much any single purpose server you might have) – I’m much more on the fence about that sort of thing than I used to be.  Thoughts?

 

Joe says:

The big picture is that a company can say “I’m covered” for ~$650 a year.  Now if the server dies, how much can they be out?  I know things don’t hard fail like they did 10 years ago, but still, a new motherboard / processor / power supply and you’ve spent more than the insurance would have cost you, and it’s on someone else to show up with the parts.  If you’re white boxing it, it isn’t as big a deal.  For Dell/HP/IBM/etc there are times when getting a specific part means you have to turn to eBay and hope for a quick ship.

 

Normally I say buy it, but only in 1 year increments.  Remember, Murphy reads these emails.

 

Larry says:

I always recommend a warranty extension if the server falls into the “production server” category and the client does not want to upgrade to a new system or there is not a business reason to upgrade.  I also found that if you call CDW, you can get the warranty for considerably less than you can from HP. In my case, I was able to extend the warranty for 2 years on a system for less cost than HP would have charged for a single year.

 

Ellis says:

I believe these need to be viewed as an insurance policy, and as such they become a business decision, not a technical one.

 

One extenuating circumstance I have run into with both HP and Dell is that you might not be able to buy a replacement part for an old system, yet you can get that part if you have a warranty.  Trying to locate replacement parts for a production system on eBay is not my idea of a responsible way to run a business.

 

Jason says:

+1 for Ellis’ comments.  If you have a production server that is mission critical, not having a service contract is insane in my opinion.  The only time I would consider otherwise is if you have an identical spare server you can use as an organ donor.  I have a client with an old Dell 2650 server that Dell will no longer write maintenance agreements on.  They are just about broke, so I suggested they buy an identical server off Ebay for $75 and use it as an organ donor.  Plus, even though their server is old, the SBS 2003 install on it is really stable. 

 

If Dell would write a maintenance contract for it, I would have told them to do this.  When doo doo hits the fan, having hardware support is always nice (just in case), and without a support contract, you don’t get that. 

 

Paul says:

With virtualization as a tool in the belt, I am having some customers keep old servers around instead of tossing them. With a restore and P2V if necessary, an older server could run the newer server’s roles for a short time while the new one is being fixed. This of course has to be looked at individually – can the old server really do the job even for a short time? Is it capable, even if slower? Does it have enough disk space, or can it do the job of serving some stuff, and the rest can be restored to USB disk if needed for the short term? Given the amount of time needed for recovery of the new server, is it worth the time/effort to restore to a recovery server? 

 

Joe considers:

There is one twist, and that’s if you’re running a VM based setup.  If you have 2-3 physical servers, and a SAN, the only service contract you need is on the SAN.  Oversize the RAM in the host servers, no such thing as too much, and migrate the VMs to another physical host if one fails.

 

In a single server instance, yes, hardware contracts are mandatory.  In a multi-server setup, you can run a box until it dies as long as all the data is on the SAN, and you have resources to move the VMs over.

 

Ellis questions:

Aren’t you making some assumptions about customer expectations and service level agreements in your comment?  In some environments, a loss of redundancy would be considered cause for immediate action.

 

Joe answers:

Yep, I’m assuming a lot, and it will vary depending on the client.  The upside is that in most instances, downtime is less than 10 minutes rather than waiting a full 4 hours, if the vendor has the part in your local area.  I’ve had Dell drive up parts from Portland for a server that was down.  Had that server been a VM, I could have moved it and not involved the hardware vendor until the client was working again.  This setup allows the client to downgrade from a 24/7 4 hour contract to a NBD contract at a huge savings.  Granted this involves a SAN that may or may not be in the budget, and the SAN would need that 24/7 2 or 4 hour contract.

 

If a 4 hour outage isn’t acceptable then they should be on a hardware replacement cycle where this discussion is moot.  

 

Ellis clarifies:

I think you’re missing my point.  Your recovery plan, supported by the hardware architecture you describe, is exactly what the customer would hope for.  My point is that once the system is brought back online, there is still an outage to recover from: the system is no longer redundant.  In that case isn’t having a warranty good practice (although sometimes more expensive than we would like) to complete the resolution?

 

Paul says:

If the server in question is fairly recent, it might be a good idea to still have that warranty on it, but how much does the warranty cost, versus the replacement part? If quick recovery and business productivity has been achieved, and now we are looking at repairing the failed system, we don’t necessarily need the quick response offered by the warranty. If the failed host server is so old that parts are not readily available, then perhaps it would be time for replacement of that host, rather than repair.

Tags: ,

Tools needed for memory upgrade: Screwdriver, alcohol, and cotton swabs??

April 9th, 2010 by Paul Sterley | No Comments | Filed in Hardware

One of my Seattle clients acquired a company in Portland recently, and with that acquisition came an HP Proliant ML350 G5 server. The acquired company didn’t need it anymore. All they needed was a firewall, switch, printer, and Remote Desktop Client on their workstations. So the ML350 G5 went to Seattle to be re-tasked. We bought more memory and hard disk capacity for it. The goal: Beef it up, make it a virtual host, and move the Terminal Server to the newer, faster box.

 

I arrived at the client site and began working on the server. I started by rearranging the memory chips for an optimal configuration, but in trying to remove the older chips, one of them simply did not want to budge. After looking for obvious obstructions, I resorted to a bit of Brute Force™, and applied enough pressure on the tabs to eject the memory chip.

 

I put in the new chips, and when I powered on the server to test those chips by themselves, there was a memory error. Long annoying beep, red lights – you know, “Danger, Will Robinson!”. I started pulling and inserting chips in various configurations, until I noticed something on one of the old memory chips. Looking more closely at the chip, I observed a crusty substance on the contacts. Further investigation revealed more of the same stuff in the memory slot. It was clearly causing some of the contacts to be stuck in their grooves, forced back out of contact by the act of ejecting that chip, and stuck there.

 

This was weird stuff. The closest thing I can recall seeing before is the white crusty residue you see on car battery terminals. It even had a reddish tint in some spots, like rust. It was in the middle of the chip. There are no capacitors on the memory chip that might have leaked, and it seems unlikely that anything would have spilled on the memory chip inside the server, unless someone was eating yogurt while adding memory to the server.

 

I used a pencil eraser to get the stuff off of the memory chip, but was momentarily at a loss as to how I was going to get the stuff out of the RAM slot. I couldn’t just skip over that slot - the memory has to be installed in pairs, and sequentially in the slots. I thought I might have to take the server back to my office and work on it with alcohol, swabs, and pipe cleaners – but then Christina, my customer, reminded me where I was. I was at Fiberlay, Inc. They have a retail Fiberglass store under the office space, and they have all kinds of supplies. A word to Christina and I had a bottle of isopropyl alcohol, a bag of swabs on long sticks, and a can of compressed air. She didn’t have pipe cleaners, but I found some cable ties that would work.

 

With the server unplugged, I set to work dabbing alcohol on the memory slots. The crusty stuff seemed to disappear as the alcohol touched it. I wiggled the cable ties around in the slots, making sure all of the crusty stuff came into contact with the alcohol, and testing the spring-loaded contacts that would touch the memory stick. After a surprisingly short amount of time, it looked completely clean. I’m not even sure where the crusty stuff went, but it was clearly out of the way.

 

I used the compressed air to good effect, spreading the remaining alcohol around until it completely evaporated and I could not smell it anymore.

 

The cleaning task complete, I replaced the memory chips, using that slot, and the server booted with a happy memory test!

 

Updated: My friend Eugene just informed me that there is a known issue with corrosion when mixing tin and gold metals together. Neither of us has ever seen this happen before in all or our years of working with computers, but this seems likely to be the culprit:

http://advantagememory.com/Home_Page/support_link/faq/why_do_gold_and_tin_contacts_mak.htm

 

Tags: , ,

Change the MAC address in an ESXi VM

April 8th, 2010 by Paul Sterley | No Comments | Filed in ESXi, Hardware, Migration, Not in the Windows Box, Virtualization, Windows Server

 

Last year, I messed around with changing the MAC address in an ESXi VM to avoid some problems with a license manager app that binds to the MAC address of the NIC when you install and license it. I was unsuccessful in my attempt to change the MAC address then. The MAC address in the VM Settings window refuses to allow you to set the first six digits, requiring you to keep the vendor code of a VMware NIC.

 

The NIC driver inside Windows, however, has an option to set the MAC address. Go to the NIC adapter properties, Advanced tab, and select NetworkAddress.

 

nicpropertiesmacaddress 

The server now thinks it has the MAC address you specified.

It works for IPCONFIG /ALL, it works for workstations that ping the server and then check their ARP cache, and it works for FlexLM, the aforementioned license manager software.

 

NOTE: I tested this in Windows 2003 SP2 on ESXi 3.5 and 4.0, and Windows 2008 SBS Edition on ESXi 4.0.

 

Further note: If you do this, it is critical that you NEVER light up the network card in your old server, which has the same MAC address, on the same physical segment of your network. It will bring down connectivity to your new server.

It is possible that you can avoid this problem and continue to use your old server in some other capacity if you change its MAC address, or if you install a different NIC. If the NIC with the matching MAC adddress is onboard though, you may still have trouble connecting to your new server from your old one. This bears testing at some future date when I have way too much time on my hands.

Tags: ,

nVidia scaling with fixed-aspect ratio: How to make it stick

April 4th, 2010 by Paul Sterley | 8 Comments | Filed in Hardware, Workstation OS

My nVidia driver has an option for “use scaling with fixed-aspect ratio”. The idea is that things can scale my screen, but they have to use a fixed aspect ratio when they do it.

This sounds ideal, but the setting refuses to stick when I set it.

After much fiddling around, I figured out how to make it stick, and I have to wonder whether this is a known and unavoidable issue, and whether it might have been documented if I had read the help files.

Basically, what I have to do is trick the system a little:

1. Set the display to a resolution that is 4:3 – for example 1280×960 or 1024×768, and apply it. Tell Windows to keep the changes.

Note: This looks like complete crap, but it’s only temporary.

2. Set the “use nVidia scaling with fixed-aspect ratio” setting and apply it. Tell Windows to keep the changes.

3. Set the resolution back to the native resolution for the monitor and apply it. Tell Windows to keep the changes.

4. Go back to the scaling screen and look. The setting should still be at “use nVidia scaling with fixed-aspect ratio”.

5. Start up your game. Whatever 4:3 resolution it goes to, it will NOT be skewed by the widescreen monitor.

I am fortunate enough to have a video card that can play UT2004 in high resolution in windows mode, so it wasn’t a big deal for that game – but I also still play Starcraft, which can only use the “main display”, only plays in full screen, and uses 640×480 resolution which is really awful when stretched on a 1920×1080 widescreen monitor.

This makes the game playable again, albeit a little fuzzy.

Tags: ,

Possible workaround when your ESXi server runs out of space on the datastore

March 10th, 2010 by Paul Sterley | No Comments | Filed in Backup and Restore, ESXi, Hardware, Hyper-V

Scenario:
You have a virtual machine running on ESXi, and either the disk is thin-provisioned, or you have one or more snapshots. The datastore runs out of space, and the VM goes down. You are unable to boot the VM because there is not enough free space on the datastore.

When you allocate memory to a VM and boot it, ESXi creates a “swapfile” on the datastore using an amount of space equivalent to the amount of RAM you allocated. By default, ESXi is configure to place this swapfile in the same folder (on the same datastore) as the VM.

Thus although the datastore might have 3.75 GB free, when you attempt to boot the server that you have allocated 8 GB of RAM to, it will not boot.

 

Solution:
If you have more than one datastore available, you can go into the vSphere Client, configuration tab, and configure the virtual machine swapfile location. Place the swapfiles on the second datastore.

If you don’t have more than one datastore, perhaps you can add one. If you have a NAS device that supports NFS, you can use that. If the onboard SATA controller on your server is supported by ESXi, you can add a cheap SATA disk to use for your swapfile location (and a good backup location) while you sort this issue out.

Once you have done this, you can boot the server, and run a backup from within the OS .

Once you have a full backup, you can delete the VM to free up space. If you ran out of room due to snapshots, you can create a new VM and start restoring your backup right away. If you ran out of room due to a thin provisioned disk that exceeded the datastore size, you will obviously need to make your datastore larger before proceeding with the restore.

Other ways you can recover from this situation:
1. Add disks to the server and extend the datastore to use them, so the datastore gets larger.

2. Move one or more of the VMDK files to the second datastore and edit your VM configuration to use the disk(s) in the new location.

How you can prevent this situation:
1. When allocating space, ensure that if you are using thin provisioning, if the disk grows to its full potential size, it will still fit on the datastore. If you want to use some of teh available space while your VMDK files are still small, go right ahead – but make sure you can either delete or move the less important machines on short notice – and monitor your disk usage!

2. leave plenty of extra room. Put more physical space in the server than you’re ever likely to need. Disks are cheap.

 

P.S. I am sure that this same concept, or parts of it, can be applied to Hyper-V virtual hosts. However, I am not familair enough with Hyper-V to give specifics.

Tags: , ,