Many big companies have a dual data centre.
Against disasters. But plains don’t crash that often.
Against outage. But reality shows, that the added complexity undermines there availability.
Often when a client is asking for solutions to increase the availability of there infrastructure the idea comes alone to have two data centres. If something goes wrong in the first one, they will continue to run in the second. If they think there business is really important the option of a third one comes alone. And if they really, really want to be on the safe side, it has to be dual active.
Now, does this work?
No, but let me put this ‘No’ in perspective. If it’s about a static webpage, it can. If you would have almost unlimited budget, perfect architecture, people and governance, it will. But this is unrealistic. So where does it go wrong? To have two data centres working together as one, taking over from one another when something goes wrong, asks for a setup that is either very simple or extremely well build.
An example from nature: if a flatworm is cut in half, each half will regenerate the corresponding lost part. More complex species can’t seem to get this handy feature to work. The same goes for your infrastructure. Two hard disks mirroring one another will work perfectly, two network cables working as one works flawless. Distributed file systems already get tricky. And start using multiple systems to provide access to a single database and everyone knows where the next disruption will occur. The complexity of a dual DC setup intended to increase the availability is so enormous that if one fails... the other one probably will too.
How big is the need for high availability?
There is a theoretical maximum availability you can reach with one data centre. But paradoxically enough most big companies are dual DC and reality shows, that the added complexity undermines their availability. All mayor companies have had failures in there availability. Even most banks have had none working websites. So this dual DC, it’s not working.
But all those banks, ISPs, cloud providers, car factories, whatever, did they go out of existence, when bankrupt because of a half hour outage? No. There all still here. Thus it is safe to say that most organisations can survive an outage. And since planes don’t crash into your data centre multiple times a year it is also safe to assume that if that disaster strikes, you will survive the couple of hour’s you are not available (everybody is probably watching the news).
How to make your infrastructure ‘more available’
Knowing all this, you can increase your availability by reducing complexity. Accept downtime. Yes accept ‘some’ downtime. Use it for example when you need to switch to the other data centre for maintenance. There is hardly any complexity in stopping a system on one site and starting it on the other. Probably 10 minutes of time, if you do it manually. And in off hours no one minds if they are informed upfront. As a result, when push comes to shove you will be updated, patched and available.
So what is the use of a second data centre?
Of course one should have a backup for one’s infrastructure, and larger organisations should have their own second data centre for resiliency. But how to put this to good use? First of all, store your data redundant. Spread your systems over both locations but make sure that no system has dependencies, or at least no critical ones, with something in the other DC. And of course, have spare capacity to run systems that failed on the other side. Use the spare systems for testing or acceptance.
It is the basic KISS principle: Keep It Simple Stupid or another one for the Bingo: Less is more. We all know it so let’s put our money where our mouth is and build better infrastructure.