What is Disaster Recovery?
Disaster Recovery (DR) in Amazon Web Services (AWS) is a component that spans a few of the six pillars of the AWS Well-Architected Framework. The AWS Well-Architected Framework provides guidance to organizations to build the most secure, high-performing, resilient, and efficient infrastructure possible for their applications. DR and resiliency align, primarily, with the reliability pillar of the Well-Architected Framework. This pillar states that reliability requires that your workload be aware of failures as they occur and take action to avoid impact on availability. Workloads must be able to both withstand failures and automatically repair issues. Although automatically repairing issues is an incredible feature, not all organizations have such stringent requirements. Disaster Recovery as defined by Gartner “the methods and procedures for returning a data center to full operation after a catastrophic interruption”. DR is explored here with the ability to recover from failure or catastrophic interruption. This article explores how DR aligns with the AWS Well-Architected Framework, how business processes shape a sound DR strategy, and how Caylent can lead your organization to success when planning for DR.
Implementing Disaster Recovery
Disaster recovery is a critical element of an organization’s business continuity plan as it allows organizations to continue operating critical workloads following an unplanned interruption. A sound disaster recovery strategy can enable organizations to remain profitable and continuously provide value to their customers. Additionally, disaster recovery strategies are often an essential requirement for some organizations that allow them to meet specific regulatory compliance standards, such as the National Institute of Standards and Technology Cybersecurity Framework (NIST-CSF).
DR strategies for an organization should start with people first and then the technical processes. The goal in planning for DR should be, if an outage does occur, to successfully mitigate the degradation of downstream processes that might include customer or user access to a system. Business architecture is the driver of the overall resiliency footprint of an organization. The Open Group’s Architectural Framework (TOGAF) suggests business architecture defines the business strategy, governance, organization, and key business processes. Business architecture should be the predecessor for architecture in any other domain, including data, applications, and technology. Non-functional requirements (NFR) such as the location of users, security and compliance of an application, and availability are critical drivers to architecting for DR from a business architecture lens. By identifying elements that are absolutely critical to the success of a business, it will be easier to architect the technical components for disaster recovery.
When architecting for DR, there are a couple of metrics to bear in mind that can have downstream impacts to users and customers. Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are two metrics that will drive an organization’s DR strategy. RTO is the maximum amount of time that is acceptable for a system to be offline before restoration. RPO is the maximum acceptable time since the last data recovery point. This determines how much data loss is acceptable during an outage.
For organizations such as health care providers or financial providers, where every transaction can either save a life or make or save millions of dollars, a DR strategy should be a non-negotiable.
AWS has poignantly presented several DR strategies to meet varying customers’ requirements. These strategies range from several hours RTO/RPO (Backup and Restore) to those that can take seconds (Active-Active.) When working with a consulting partner such as Caylent, our goal is to ensure your organization is successful in whatever strategy is appropriate for your team and workload. For customers who have just begun their journey to DR, the Caylent DR Catalyst is an exceptional opportunity to work with our experts and to plan and prepare for your DR implementation.
Disaster Recovery Catalyst
The Caylent DR Catalyst is a 2-week engagement that allows your team to go on a deep dive for your architectural requirements to support DR. This includes business architecture and NFRs such as regional availability, RTO/RPO, latency, regulatory compliance requirements, and more. Additionally, we provide our expertise around best practices to enhance a previously existing AWS footprint with tools such as Infrastructure as Code to allow for a portable and repeatable way to redeploy an architecture. Caylent cloud architects and engineers will actively engage with your organization to identify opportunities for cost savings and technical opportunities as it relates to DR planning.
DR Planning should be an activity that is very intentional in its outcome and desired state. The Caylent team will work closely with your organization to meet specific goals while planning for DR such as organizational-specific regulatory compliances, and operational excellence. The Caylent team will also review DR testing strategies to mimic outages. This exercise allows your organization to glean immediate value in understanding how to architect for resilience and preparing your organization for business continuity from this engagement.
With our consideration of AWS’ Well-Architected Framework, your organization’s business and technical architecture requirements, in addition to our years of experience and expertise in guiding customers to success and preparing for disaster before disaster strikes, your organization will be primed to continuously provide value to your end-users and customers when a disaster does strike. As AWS Chief Technology Officer Werner Vogels stated,”Everything fails all of the time”. Let Caylent be the organization that helps you plan for and mitigate failure.