Use AWS Fault Injection Service to demonstrate multi-region and multi-AZ application resilience

Trending 5 months ago

Voiced by Polly

AWS Fault Injection Service (FIS) helps you to put chaos engineering into believe astatine scale. Today we are launching caller scenarios that will fto you show that your applications execute arsenic intended if an AWS Availability Zone experiences a afloat powerfulness interruption aliases connectivity from 1 AWS region to different is lost.

You tin usage nan scenarios to behaviour experiments that will build assurance that your exertion (whether single-region aliases multi-region) useful arsenic expected erstwhile thing goes wrong, thief you to summation a amended knowing of nonstop and indirect dependencies, and trial betterment time. After you person put your exertion done its paces and cognize that it useful arsenic expected, you tin usage nan results of nan research for compliance purposes. When utilized successful conjunction pinch different parts of AWS Resilience Hub, FIS tin thief you to afloat understand nan wide resilience posture of your applications.

Intro to Scenarios
We launched FIS successful 2021 to thief you execute controlled experiments connected your AWS applications. In nan post that I wrote to denote that launch, I showed you really to create research templates and to usage them to behaviour experiments. The experiments are built utilizing powerful, low-level actions that impact specified groups of AWS resources of a peculiar type. For example, nan pursuing actions run connected EC2 instances and Auto Scaling Groups:

With these actions arsenic building blocks, we precocious launched nan AWS FIS Scenario Library. Each script successful nan room defines events aliases conditions that you tin usage to trial nan resilience of your applications:

Each script is utilized to create an research template. You tin usage nan scenarios as-is, aliases you tin return immoderate template arsenic a starting constituent and customize aliases heighten it arsenic desired.

The scenarios tin target resources successful nan aforesaid AWS relationship aliases successful different AWS accounts:

New Scenarios
With each of that arsenic background, let’s return a look astatine nan caller scenarios.

AZ Availability: Power Interruption – This script temporarily “pulls nan plug” connected a targeted group of your resources successful a azygous Availability Zone including EC2 instances (including those successful EKS and ECS clusters), EBS volumes, Auto Scaling Groups, VPC subnets, Amazon ElastiCache for Redis clusters, and Amazon Relational Database Service (RDS) clusters. In astir cases you will tally it connected an exertion that has resources successful much than 1 Availability Zone, but you tin tally it connected a single-AZ app pinch an outage arsenic nan expected outcome. It targets a azygous AZ, and besides allows you to disallow a specified group of IAM roles aliases Auto Scaling Groups from being capable to motorboat caller instances aliases commencement stopped instances during nan experiment.

The New actions and targets experience makes it easy to spot everything astatine a glimpse — nan actions successful nan script and nan types of AWS resources that they affect:

The scenarios see parameters that are utilized to customize nan research template:

The Advanced parameters – targeting tags lets you power nan tag keys and values that will beryllium utilized to find nan resources targeted by experiments:

Cross-Region: Connectivity – This script prevents your exertion successful a trial region from being capable to entree resources successful a target region. This includes postulation from EC2 instances, ECS tasks, EKS pods, and Lambda functions attached to a VPC. It besides includes postulation flowing crossed Transit Gateways and VPC peering connections, arsenic good arsenic cross-region S3 and DynamoDB replication. The script looks for illustration this retired of nan box:

This script runs for 3 hours (unless you alteration nan disruptionDuration parameter), and isolates nan trial region from nan target region successful nan specified ways, pinch precocious parameters to power nan tags that are utilized to prime nan affected AWS resources successful nan isolated region:

You mightiness besides find that nan Disrupt and Pause actions utilized successful this script useful connected their own:

For example, nan aws:s3:bucket-pause-replication action tin beryllium utilized to region replication wrong a region.

Things to Know
Here are a mates of things to cognize astir nan caller scenarios:

Regions – The caller scenarios are disposable successful each commercialized AWS Regions wherever FIS is available, astatine nary further cost.

Pricing – You salary for nan action-minutes consumed by nan experiments that you run; spot nan AWS Fault Injection Service Pricing Page for much info.

Naming – This work was formerly called AWS Fault Injection Simulator.

Jeff;

More
Source AWS Blog
AWS Blog