Every business decision related to spending results in a review of the current budget plan of the team, department or company. Calculations based on financial plans should be reflected in the risks the organization takes. If the business cannot afford downtime, the enterprise is faced with the decision of what resources to allocate to protect the business from possible threats from outside (hacking attacks) or even from within the organization (disloyal employees). Over the past decade or so, IT infrastructure has been shifting from locally maintained (on-premise) environments to delegating maintenance to public clouds (Microsoft Azure, Amazon Web Services, Google Cloud Platform) or private clouds (All for One Cloud or other providers). The decision to transfer responsibility to a provider has risks that should be noted:
- In order to ensure business continuity, will it be possible to open IT systems in a finite amount of time acceptable to the business owner?
- What is an acceptable data loss over time?
- How will we respond when our primary environment (on-premise, public cloud or private cloud) is not available?
A good answer to many of the questions above is to set up a backup environment with repeatable recovery testing. Possible scenarios for a Disaster and Recovery Center can vary and result from the current and planned system maintenance strategy and business conditions.
In-house resources
If you have your own resources, it is a good idea to create a replica of your data in a private or public cloud. Depending on the technological capabilities and needs, different solutions can be considered:
- For the most business-intensive systems where environment availability time is important, the solution can be divided into two stages:
- Creation of a synchronous replica of all resources in a second center or private cloud of a local provider. We will get the protection in the form of the second identical version of the data;
- In addition (for customers where access to data is critical), a difference in data synchronization time can be maintained between data centers so that depending on the problem at hand it is possible either switch without data loss or switch for selected systems with a specific data loss, e.g., one hour during which data corruption occurred will be rolled back.
- For less demanding customers for whom a data loss of 15 minutes does not pose a risk to business continuity, non-synchronous solutions can be proposed, i.e. solutions where data is replicated through transaction logs or replication at the virtualization level.
- For customers who can afford more downtime, only backups of data created in the primary environment can be replicated, creating a restore procedure with the order of when and which systems to restore. Service availability time can be then measured in hours or days.
Private cloud
For systems maintained in a private cloud, we can create a replica in the second private, public cloud or maintain a business continuity environment in our own on-premise solution. Depending on the provider’s capabilities, variants similar to those described above can be prepared (for on-premise solutions):
- synchronous replication;
- asynchronous replication (including data replication to the public cloud);
- backup replication.
Public cloud
For resources hosted in the public cloud (Azure, AWS, GPC), different backup data center scenarios are possible, depending on the availability of services in different regions:
- Replication of entire VMs to another region;
- Replication of backups to another region.
In all of the above system maintenance models, the choice of strategy for the backup data center will depend on business requirements, acceptable system downtime, and of course the amount of investment the company is willing to make.
Disaster recovery testing
A backup data center is an important part of building a secure IT environment. Its integral complementary elements are business continuity plans. Their creation and cyclic testing (at least once a year) is necessary to maintain high readiness of the backup center to take over productive operation in the defined parameters in case of unavailability of the primary systems. It is important to involve in the tests not only the IT department but also business owners of the systems – i.e. persons responsible for individual organizational units that use IT environments.
Disaster & Recovery Center is a fixed cost for the organization, but how to value the losses caused by a multi-day interruption in the operation of the entire business?
If our backups are tested and restored, we can be sure that we you can restore the data with an acceptable loss, if any. Two important issues remain to be resolved:
- Where will we restore the data (since our server room has just burned down or the matrix has been irretrievably encrypted as a result of a cyber attack)?
- When will we be able to start working in a different location after restoring the data (e.g. will the connections with our partners work, how will we solve other problems arising during DR tests)?
Testing of business continuity and disaster recovery plans is an integral element of the company’s IT security system. Only during a “test alarm" are we able to verify whether the prepared procedures will work, how much time each described step actually takes, and whether we have not omitted some seemingly insignificant element that may be decisive for the success or failure of running the system in a backup location.
Based on the experience from the tests, we can learn about the real scale of problems that may occur, calmly refine the technical details and get answers that are important to the business: how much data can we really lose during a failure and how long will it take to get the systems up and running.