AWS Fault Tolerance System: A Setup Guide

A Fault Tolerance Infrastructure ensures flawless functioning of the system, even if one or more components have failed to perform or not working up to the mark. Understand more about the fault tolerance infrastructure in our previous blog; where we have touched upon what is Fault Tolerance and High Availability, and defined the services being used to setting up a fault tolerance infrastructure.

Implementing Multi-Region Fault Tolerance Infrastructure

Now, let us dig deep into the step-by-step guide to setting up a fault tolerance infrastructure on the AWS console. The aim is that the traffic of a system should be diverted to another one, as described in the below image.

AWS Multi-Region Fault Tolerance Infrastructure diagram with Route 53, API Gateway, Certificate Manager, AWS Lambda, and Recovery Controller.

The event-driven serverless architecture performs a failover by updating the weights of the Route 53 record. This shifts the traffic flow from the primary to the secondary Region. This operation specifies the source Region from where the failover is happening to the destination Region.

Let’s dive into the implementation with one service at a time and see how fault tolerance is achieved with screenshots to support them.

Route 53:-

Route 53 is a DNS service enabling users to access websites and online resources by translating human-readable URLs into IP addresses for machine understanding.

Create a Hosted Zone with Route 53.
Log in to your AWS console, navigate to the AWS Management Console dashboard, and search for 'Route 53'. Then, click on 'Create Hosted Zone'.

Route 53 dashboard with DNS management, hosted zone creation, health checks, domain registration, and traffic management options

Enter all the required details and click on 'Create Hosted Zone'.

Hosted Zone configuration interface for creating a domain with options for name, description, type (public or private), and tagging.

After successfully creating the zone, you just need to add the Name Servers in your domain panel.

Records interface displaying NS and SOA records. Adding Name Servers after zone creation

DNS has been created and now we need to generate an SSL certificate.

Amazon Certificate Manager

Amazon Certificate Manager (ACM) lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services.

Navigate to the AWS Management Console dashboard and search for "Certificate Manager".
Click on 'Request a Certificate', select 'Request a Public certificate', and click 'Next'.

AWS Certificate Manager interface: Search, request a public SSL/TLS certificate for secure communication.

Now, provide your domain name, select the 'Validate' method, and click on 'Request.

AWS Certificate Manager: Add domain names, select DNS validation, and choose encryption algorithm for certificate request.

Validate your certificate request by adding DNS records. Once validated, it will show the issued certificate.

AWS Certificate Manager: Certificate details including Certificate ID, type (Amazon Issued), and status (Issued and In use).

AWS Lambda

Lambda runs your code to perform all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning undefined automatic scaling, and logging. Provide your code in one of the supported language runtimes.

In the primary region, go to the AWS Management Console dashboard, search for 'Lambda,'
Click on 'Create Function' and fill in all the details to create the function.

Create AWS Lambda function: Search for Lambda, enter function details like name and runtime.

Write your health check code for Lambda, then click on 'Deploy' and test.

AWS Lambda Health Check Code: Python code for Lambda function my-us-east-1 checks Google health, deploys, and tests.

Follow the same process in the Secondary region, such as US-EAST-2.

API Gateway

Create APIs to access AWS or other web services, as well as data stored in the AWS Cloud, that can be used in your own or client applications, or even let it be available to third-party app developers.

In the AWS Management Console dashboard, search for 'API Gateway,' and click on 'Create API’.
We are implementing a REST API, so navigate to REST API and click on 'Build'.

AWS API Gateway Setup: Search API Gateway, create REST API for control over requests, compatible with Lambda, HTTP, and AWS Services.

Select the protocol as REST and choose 'New API’.
In the settings, enter your health check API name, select the Endpoint Type as 'Regional,' and click on 'Create API’.

AWS API Gateway Setup: Choose REST protocol, create new API, set health check API name, select Regional Endpoint, and create the API.

Now go to 'API,' click on 'Action,' then 'Create Resource.' Provide the resource name as 'healthcheck' and click 'Create Resource'.

AWS API Gateway Setup: Go to API, create resource named healthcheck with path /healthcheck.

Now, click on 'Create Method' and select 'ANY' from the drop-down options.

AWS API Gateway Method Creation: Clicking Create Method, selecting ANY from the drop-down options for the healthcheck resource.

Select 'Lambda Function' as the Integration Type, choose your Lambda Region, provide the Lambda Function Name, and click 'Save’.

AWS API Gateway Integration: Select Lambda Function, set Lambda Region, provide Function Name, and click Save.

Now Click on Deploy API from the Action menu

AWS API Gateway Deployment: Click Deploy API to deploy the healthcheck method with method actions, resource actions, and API actions.

Provide the deployment stage (create new as dev) and click on 'Deploy'.

AWS API Gateway Deployment: Providing deployment details, creating a new stage dev, and clicking Deploy for API deployment.

It will provide you with a temporary URL to check the API. Simply click on the given URL and append '/healthcheck' at the end of the URL to trigger your Lambda function.

AWS API Gateway Testing: In the dev stage, view the Invoke URL and test the API by appending /healthcheck.

Now, it will display the status of your healthy region.

AWS API Gateway Status: The test result indicates a healthy status for the us-east-1 region.

Let's add a custom domain with API Gateway
Go to Custom domain names on the Left Panel of API Gateway

AWS API Gateway Custom Domain Setup: Accessing Custom domain names in the redesigned API Gateway console.

Click on Create Option
Insert Your domain name

AWS API Gateway Custom Domain Creation: Clicking Create and inserting a custom domain name for a more intuitive URL.

Now, select ACM (AWS Certificate Manager) from the Endpoint Configuration option, and click on 'Create Domain Name'.

AWS API Gateway ACM Configuration: Choosing ACM (AWS Certificate Manager) from Endpoint Configuration and selecting a certificate for the custom domain

Click on the 'API Mapping' tab, then 'Configure API Mapping.' Add a new mapping and select your 'healthcheck' API and stage.

AWS API Gateway Mapping: Adding a new mapping for apiHealth REST API.

Follow the same process in the Secondary region, such as US-EAST-2.

Amazon Route 53 Application Recovery Controller

Amazon Route 53 Application Recovery Controller lets you know if your applications and resources are ready for recovery. Across the AWS Availability Zones (AZs) or Regions, the Application Recovery Controller also helps you manage and coordinate recovery for your applications. It reduces the manual steps required by traditional tools and processes and makes recoveries simpler and more reliable.

Navigate to the AWS Management Console dashboard and search for "Route 53 Application Recovery Controller".
We are setting up a regional failover setup with Route 53 Application Recovery Controller, so navigate to 'Multi-region' in the left panel.
To begin, we need to set up a recovery group. Click on 'Readiness Check,' then click on 'Create a new Recovery Group.' Provide a Recovery Group name and click 'Next'.

Route 53 Application Recovery Controller: Setting up a regional failover, creating a new Recovery Group named MyWebAppRG.

Now we need to add a cell. So click on 'Create Cell' and then 'Add Cell'.
Provide East region cell name, click on 'Next', and then 'Create Recovery Group'

Route 53 Recovery Group: Adding a cell to the Recovery Group for the East region.

Then we need to create a resource set. Click on 'Create'
Provide resource set name and select API Gateway from the drop-down menu

Route 53 Recovery Controller: Creating API-RS resource set for readiness checks.

Now add your API Gateway stage ARN in the 'key' and click 'Create Resource Set'

Route 53 Recovery Controller: Adding API Gateway stage ARN for a new resource set creation. Entering ARN in key and clicking Create Resource Set.

Now create a readiness check. Click on 'Create a new readiness check'

Route 53 Recovery Controller: Creating a new readiness check for the MyWebAppRG recovery group. Initiating 'Create a new readiness check.

Provide readiness check name and resource type and click on 'Next'

Route 53 Application Recovery Controller: Naming readiness check as MyWebApp-API-Rcheck, selecting resource type Api Gateway Stage, and proceeding to the next step.

Add resource set, select existing Resource name, and click 'Next'

Adding a resource set named API-RS of type Api Gateway Stage to a readiness check in Route 53 Application Recovery Controller.

Select 'Recovery group options,' choose resources, provide a cell for each identifier region-wise, and click 'Next' to create the readiness check.

Configuring Route 53 Recovery Check for MyWebAppRG with resource identifiers in us-east-1 and us-east-2 regions.

Now Click on 'Clusters' and then click on 'Create'
Provide cluster name and click on 'Create cluster'

Creating a Route 53 Application Recovery Controller cluster named MyWebAppCluster. Confirming pricing changes before creating the cluster.

Now go to Routing Control and navigate to the default Control Panel
Click on 'Add routing control'

Navigating to Routing Control in Route 53 Application Recovery Controller. Adding a new routing control to the default Control Panel.

Now create routing control names for both regions, primary and secondary

Configuring routing controls for primary and secondary regions in Route 53 Application Recovery Controller. Associating controls with control panels and clusters.

Let's create the health check for both regions.
On routing control, click on 'RC-us-east-1' then 'Create health check'

Creating health checks for the RC-us-east-1 routing control in Route 53 Application Recovery Controller. No health checks associated currently.

Provide a health check name, click on 'Create,' and do the same step for both regions.
Now, in the last step, we are going to add records with Route 53.
Navigate to Route 53, click on 'Hosted zone,' select your domain, and then click on 'Create records'
You can add records as shown in the screenshot for both the primary and secondary regions

Creating health checks for RC-us-east-1 in Route 53 Application Recovery Controller, adding records for primary and secondary regions.

Creating health checks for RC-us-east-1 and RC-us-east-2 in Route 53 Application Recovery Controller, adding records for primary and secondary regions.

Now, you can browse the API URL in your browser, and it will show you the primary region link, which should be marked as healthy

Testing API health by browsing the URL. Response: {statusCode: 200, body: us-east-1 Service is healthy}.

When your primary region goes down it will automatically redirect your URL to the secondary region

Automated failover redirects to the secondary region. Response: {statusCode: 200, body: us-east-2 Service is healthy}.

Also, you can check the health of both regions from the Route 53 health check

Route 53 health check dashboard displaying status: us-east-1-API (Unhealthy, No alarms configured), us-east-2-API (Healthy, No alarms configured).

End Note

In today's fast-paced and interconnected digital world, the need for a highly available infrastructure has never been greater. We've explored how the Fault tolerance infrastructure can be set up with a step-by-step guide. It brings numerous benefits to the table, from minimizing downtime and enhancing reliability to ensuring business continuity and improving customer satisfaction. A fault tolerance infrastructure is not just a luxury but a necessity for organizations seeking to thrive in the face of disruptions and maintain the trust of their users.

We at Seaflux are your dedicated partners in the ever-evolving landscape of Cloud Computing. Whether you're contemplating a seamless cloud migration, exploring the possibilities of Kubernetes deployment, or harnessing the power of AWS serverless architecture, Seaflux is here to lead the way.

Have specific questions or ambitious projects in mind? Let's discuss! Schedule a meeting with us here, and let Seaflux be your trusted companion in unlocking the potential of cloud innovation. Your journey to a more agile and scalable future starts with us.