According to credible estimates, an hour of outage may cost a medium sized company $70,000. Yes, that is accumulated losses when IT systems go offline. What’s interesting to note here is that in contrast to the popular belief that natural disasters constitute the primary reason for IT system failure, a recent study finds hardware failure to be the leading cause, by a big margin, of IT disasters and the losses, both financial and loss in credibility, which small and medium sized businesses have to incur. However, if SMEs take the right precautions, much of the loss can be quickly remedied, even if it occurs.
I do not need to argue about the importance of prompt recovery from IT disasters. Even if your business can burn through $70,000/hour of losses, the loss in customer confidence, especially for consumer facing enterprises may not be repaired, ever. A study by HP and Score also reveals that 1/4th of medium sized businesses go broke as a result of a major disaster. It shows the ROI for investing your time and money in contingency planning and executing dry runs to ensure your plan works.
Among the four major types of disasters – Hardware failure, natural disasters, human error and software failure, only natural disasters are something which are not in human control, everything else, including human error can be tamed, if not controlled. The key however is to be prepared for extreme situations and make your plans based on disaster predictive studies available out there.
Unless your organization is unique, it’s very much likely that you have one SAN (Storage Area Network) or NAS (Network Attached Storage) which is being utilized across your organization. In order to keep storage simple and scalable, organizations tend to neglect the doomsday scenario which may trigger due to a slight failure of their SAN. On top of it, all data, including virtualized storage relies on this one big SAN. Now imagine this SAN failing for any reason – there are plenty. Since the whole IT environment is connected to the SAN, the whole IT infrastructure comes to a halt, all because of SAN failure. This is not a hypothetical scenario which I’m creating to drive my point home, rather, it’s one of the major causes of hardware failures which result in IT disasters. Let’s look at some of the measures organizations may take to mitigate risks. First comes redundancy but even with layers of redundancy, if your SAN is not diversified (separate systems and not one big unit), there are good chances those added layers of redundant storage will fall like a house of cards when disaster strikes. Next comes ensuring a standard data backup policy is made and followed to the letter and spirit. However, surveys suggest that it normally take tens of hours to recovery from SAN failure with tape and disk backups. Some studies draw an even starker picture by claiming that tape backups often fail.
Cloud backup seems to be an emerging trend, primarily driven by the idea to ‘physically’ diversify your storage network. Organizations which deeply embrace Cloud completely let go of any internal SAN and rely on the Cloud. This may not be a wise move considering that Cloud may also fail (remember the Amazon EC2 failure which brought down mega internet services like Reddit etc?). Using Cloud backup is a credible plan to recover from any storage related IT failures. Diversifying your Cloud backup pool only further strengthens your IT and mitigates failure risks.
No matter how strong your IT systems are, they’re prone to failure. This may happen because of your system administrator accidentally wiping out server file system or a hurricane sweeping through your data center. Preparation is the key.