For any business-critical system, resilience is important: the ability to maintain an acceptable level of service in the face of faults. When you hear about aircraft flying on after the failure of one engine, that’s resilience. For your business systems, if you wish to minimise down-time and disruption, then simply taking regular backups is not enough…
1. Backups. So backups alone aren’t enough, but backups are certainly essential: hard disks can and do fail without warning. Your data should be copied regularly, and stored somewhere away from your main system so they can’t be destroyed together. Even more importantly, you should test restoring your data – find out whether your backups really work correctly before you actually need them for real.
2. Hardware. Hardware is another potential risk: computer memory can develop faults, processors can overheat, and motherboards can fail. If that should happen, a good way to ensure a quick recovery is to buy and store all of the essential components and hold them in reserve ready for a quick fix. Even better, maintaining two or more complete systems means that you can quickly ‘fail over’ to a standby machine when necessary.
3. Power. Any single point of failure could leave the best laid resilience plans in tatters, and power cuts can and do happen. Many professional data centres will guarantee uninterrupted power through a combination of battery systems and diesel generators. For smaller systems, it’s worth bearing in mind that laptop computers have the advantage of built-in batteries, and are therefore more resilient than desktop computers when it comes to handling power cuts.
4. People. If all of your resilience plans have been masterminded by a single member of staff, will you be at risk every time he or she is away from the office or on holiday? Ensure that at least two people are fully up-to-speed with all of your resilience and backup plans for each system, and then co-ordinate their schedules and holiday plans so that at least one of them is readily available at any given time. You’ll be very glad to have a local expert on hand if things start to go wrong!
Good stuff. This reminded me of another, related post over on Carsonified’s Think Vitamin blog. It’s at http://carsonified.com/blog/web-apps/five-things-that-will-kill-your-site/ and if you found this post useful, you’ll probably find that useful too.