Misc Systems Administration Notes and Things to Look for When Evaluating Systems
2015-05-27 Last Updated
The list below is not in any particular order nor is it exhaustive. I add items as they come to mind.
- Take anything that sales people say with a grain of salt. They can promise you the world, but until they deliver, don't get your hopes up and do not put down any type of deposit until after delivery. And be thorough in learning about new products because if there's a feature that you need but don't find out about it until after the sale, you're in a bad position to negotiate since the vendor already has your money.
- Pick a very general naming convention for servers, Web sites, etc, because future vendor mergers and acquisitions could change the product names. And some vendors just change names for marketing reasons. For example, after McAfee acquired Secure Computing, which made the IronMail appliances, those appliances were renamed to McAfee Email Gateway. So If you had those appliances when they were still known as IronMail and named those appliances something like nyc-ironmail-01, nyc-ironmail-02, etc, you'd be calling them "IronMail" until you either changed the name or replaced them. This isn't a huge issue, but it's just better to use a generic name to begin with, such as nyc-egw-01 (egw = e-mail gateway). What about something such as MS SharePoint? Instead of nyc-sp-01, try something like nyc-collab-01 (collab = collaboration). With all the asset management tools out now, it's easier to keep track of which server does what, so naming conventions don't need to very strict, but at least have some type of logical naming convention.
- Unless it's an absolute requirement, never upgrade all your systems at the same time. For example, if you have two e-mail filtering appliances in a cluster, don't upgrade both at the same time. Upgrade one and let it sit for a week or two to see if there are any issues with the upgrade. If the upgraded appliance has issues, you should be able to easily redirect all traffic to the other, non-upgraded, appliance. This way you can troubleshoot the issue with less user impact (you'd lose redundancy for a short time, so it's a risk). I had an experience where both McAfee Email Gateway appliances in the cluster were upgraded and the upgrade had a bug in it. The upgrade couldn't be removed without basically reformatting the appliance and reinstalling everything. So either we'd go with that option or wait weeks or months for McAfee to issue an update to correct the bug.
- The more people you talk to, the more questions you'll come up with. For example, Vendor B might offer a feature that you weren't even aware of, so you can go back to Vendor A and see if he has a similar feature.
- Does the system support integration with AD/LDAP for user accounts, or does it contain its own user accounts database? Any system that contains its own user accounts database should have an open standard for creation of, and importing and exporting of, user account information. By open standards I mean something even as simple as support for CSV or XML format. For a bad example, the Ricoh Aficio MP 5500 has a proprietary binary format for import/export of user accounts for the scanning feature. Because of that I had to manually add users into the system, which was not fun when there were 50 users to add--one by one.
- When integrating with, or connecting to, an externally hosted service (such as a service/help desk system), can the service accept SSL certificates from your internal certification authority? Some services will allow you to upload your cert chain into their system which allows them to trust your internal CA, thus saving you from having to purchase a cert from a public CA.
- Get clarification when the term “compatible with,” or something similar is used. For example, a thumb drive might be “compatible with USB 2.0” but actually run at the slower USB 1.1 speed. Since USB 2.0 is backwards compatible with USB 1.1, the “compatible with USB 2.0” claim is valid. So in this case you need to clarify if the drive can actually transfer data at full USB 2.0 speed.
- Licensing - Is there some type of "starter pack" or "quick start" where components and services are bundled together and sold at a lower cost?
- Licensing - Is there any discount if you're currently using a competitive product or older version?
- Licensing - Is a license of the product included for testing and development? This is important to test business continuity/disaster recovery. You certainly don’t want to have to pay for a license that you won’t use in production. If you need to pay extra, you might be better off going with something similar to Microsoft's TechNet Subscription which gives you licenses to "over 70+ full-version Microsoft software without any time or feature limits for evaluation purposes only." (NOTE: As of 2013 or thereabouts, MS TechNet is no longer available so the next best thing is MSDN, but MSDN is significantly more expensive).
- Licensing - Be aware that some OEM licenses have a different support model. For example, you may purchase VMware vSphere licenses from VMware or via one of the server manufacturers like HP. If you get the VMware OEM license from HP, the cost is lower, but VMware support is provided through HP (I actually am fine with this since most of the VMware support cases I've opened were related to hardware). One big gotcha though, is that OEM licenses are usually tied to the hardware and are not transferable to new hardware (Microsoft does this with Windows licenses on laptops, for example). Depending on your organizations accounting practices, it may work out better to get the OEM license since hardware is normally depreciated over 5 years, so in 5 years the savings from the OEM license may make up for the fact that the license isn't transferable.
- Licensing - With virtualization, you may gain significant advantages in licensing. For example (as of Dec 2014) a Windows Server Datacenter license (which allows for 2 CPU sockets) may be used on a single VMware ESXi host to allow unlimited instances of Windows Server VMs. There's a certain breakeven point, 13, where it's more financially advantageous to purchase the Datacenter license for the host rather than Windows Server Standard licences (one license allows for two VMs). Windows Server Standard actually allows for 2 VMs on the physical host. So if Windows Server Standard is $772.00 and Windows Server Datacenter is $5,239.07, per license; the math is $5,239.07 / ($772.00/2) = 13.57. I've seen some people do the calculation without dividing the later number by 2, which is incorrect. Note that you wouldn't actually enter a Windows Server Datacenter license key anywhere within vSphere so the license purchase is just for compliance.
- When evaluating anything involving storage space, network speed, etc, make sure you find out exactly what you’re getting based on the configuration that you’ve chosen. For example, when evaluating storage arrays, you might require 10 TB of “usable storage” and also want to use some special replication feature. Well, the term “usable storage” isn’t always defined the same by everyone. You might get 10 TB of usable storage, but after allocating space for replication, you might only end up with 8 TB of actual usable storage. So in this case, it’s important that you have the vendor write up the quote with enough drives to allow you to have 10 TB of storage AFTER replication and any other special features are accounted for.
- Make sure that all systems implementations include maintenance procedures for backups, performance tuning, patching, etc. I think this is one of the key items that get glossed over and no one worries about it until years later when these tasks become necessary. If consultants are doing the implementation, they should include this information as a project deliverable since it’s a standard component of the systems development lifecycle.
- What can be used to back up the system? Can backups be automated and scheduled? How easy is it to backup AND RESTORE?
- When testing failover, make sure to also time the failback because some times a failback can take longer than a failover. I had a case where a vSphere 5.5 management network was setup with two NICs in an active/standby configuration. During testing of failover, we only lost one ping. But during testing of failback, we lost seven pings (were were able to replicate this test, so the numbers are consistent).
- How do you monitor the system? Does it use SNMP or syslog? Does it have automated alerts if a particular threshold is exceeded (disk usage, network usage, etc.)?
- If any client installation is required, are ALL the install files in MSI format (or some other standard software packaging format)? Some applications still use setup.exe to install client software, which can be difficult to push out with client management systems such as Altiris. Some systems might use a combination of VBS, MSI and EXE files which doesn’t make for the cleanest install, so you have to get more detailed info about that. The sales person might say that they use MSI files, which is true, but then you get the product and find that it uses MSI files along with VBS and EXE files to do an automated client install. At that point you’re stuck trying to get it to install cleanly.
- When looking at quotes, it can be very difficult to compare one vendor to another. Product names will be abbreviated and might be bundled together so it’s not easy to do an apples-to-apples comparison. Make sure you have the vendors put each product as a separate line item and include the cost for each line item, not just the total. When I worked with EMC and NetApp to get quotes on storage arrays, I had to to go back and forth a few times to clarify what was on their quotes.
- One vendor might implement something differently than another, e.g., NetApp and EMC might both use sixteen 400 GB hard drives in their arrays, but the way that they configure RAID and other settings can leave you with different numbers for actual usable storage. In that case, you’d need to see what are the benefits of one configuration over another instead of only taking total raw disk space into consideration.
- I would be leery of foreign companies that do not have strong US-based post-sales support. They might have US- or Canadian-based pre-sales support to get you to buy the product, but afterwards all support is foreign-based. Foreign companies that have mostly foreign support can be an issue. With the different time zones, the support staff might be limited at certain times. Also, the language barrier is always an issue. Even if tech support can understand English, it can be difficult to understand someone with a strong accent. And there are different types of English, so there could be some issues with interpretation of certain words. (Note: I was born in a non-European country, so this isn't some type of racist comment--it's based on my experience. And the language issue is not just with Asian-based support. I've had issues with Israeli-based support as well. The point is that it doesn't matter how smart the person helping you is if you can't understand the person clearly.)
- This one is difficult to explain, so I'll give an example. When I looked at specs for a Dell PowerEdge T110 server, it listed 4 internal SATA interfaces and 1 external eSATA interface. Seeing that, you'd think that you could use 4 hard drives internally and 1 externally, at the same time, but that might not always be the case. The eSATA interface could actually take up one internal SATA interface so that you could never have more than 4 SATA interfaces in use at one time. You want to clarify whether or not you can use all 5 SATA interface at the same time (4 internal + 1 eSATA)--don't just assume. With this Dell server, it actually is able to use all 5 SATA interfaces at the same time.
- When purchasing hardware, make sure to get the physical requirements for the system, such as electrical connectors and rack requirements. A previous employer of mine had purchased an EMC VNX5700 and no one involved even asked the two basic questions I just mentioned. The infrastructure staff had been virtualizing and moving servers around to clear enough rack space for the VNX5700. Later on in the project some questions were brought up and we went to open the box that contained the unit to verify if it had its own rack. Sure enough, the unit was already self-contained in its own cabinet/rack and only needed to be wheeled into place. But since the existing rack that the unit was supposed to go into was bolted down and connected to adjacent racks, it wasn't a simple task to remove the old rack. Also, the unit required four electrical connections of a specific type (NEMA L6-30P), of which we only had two. The company had an existing EMC NS120 which was mounted in a standard server rack, so the infrastructure manager assumed that the VNX5700 could be set up the same way. Looking at EMC's brochure, one can see that the VNX5700 is in a custom cabinet but the smaller VNX units look like they can be rack mounted.
- Make sure you know how to properly shut down any new system that you implement. That seems trivial, but I had an incident where severe weather put our server room on generator power and we weren't sure if we'd last on that, so we needed to be prepared to shut down our systems. Systems such as EMC VNX or Cisco UCS are very complex when compared to a standalone server, so they require specific shutdown procedures.
- If you are preparing to implement a new system and have credits for training, try to take the training BEFORE you implement the system. I worked at a company that implemented an EMC VNX 5700 and we were all clueless when it came to features like FAST and FAST Cache. It would have been helpful if we had sent one of our team members to training before hand.
- More is not always better. For example, an Exchange Server 2010 Client Access server has a recommended maximum for RAM (8 GB), so putting more in could actually cause performance issues.