Which of the following are focuses of the Reliability pillar of the well architected framework?

As you’ve no doubt realized, when discussing the cloud there is a lot of new terminology to learn, this is even more true when discussing Amazon Web Services (AWS) specifically. If you’re considering deploying your infrastructure into the AWS cloud, you’ve most likely heard the term “well-architected”, but you’re not exactly sure what the AWS Well-Architected Framework is.

Developed by AWS solutions architects after years of building solutions for clients, the AWS Well-Architected Framework is a way to learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. It enables you to consistently measure your architectures against best practices and identify areas for improvement.

Built on a set of foundational questions, the framework will help you determine how well your architecture aligns with AWS cloud best practices. It is purposefully designed to generate a conversation rather than act as an audit mechanism, and it provides a consistent baseline for architecture decision making.

The first step to adopt AWS best practices is conducting a Well-Architected review. Working with an AWS Well-Architected Partner or through the newly released Well-Architected Tool, the review asks a series of questions for each of the five AWS Well-Architected pillars. Your answers to these questions form the basis for the second step, the remediation plan required to achieve AWS Well-Architected status.

To better understand the AWS Well-Architected Framework, let’s look at the five pillars:

Operational Excellence

The operational excellence pillar focuses on running and monitoring systems to deliver business value, and continually improving processes and procedures. Key topics include managing and automating changes, responding to events, and defining standards to successfully manage daily operations.

Security

The security pillar focuses on protecting information & systems. Key topics include confidentiality and integrity of data, identifying and managing who can do what with privilege management, protecting systems, and establishing controls to detect security events.

Reliability

The reliability pillar focuses on the ability to prevent, and quickly recover from failures to meet business and customer demand. Key topics include foundational elements around setup, cross project requirements, recovery planning, and how we handle change.

Performance Efficiency

The performance efficiency pillar focuses on using IT and computing resources efficiently. Key topics include selecting the right resource types and sizes based on workload requirements, monitoring performance, and making informed decisions to maintain efficiency as business needs evolve.

Cost Optimization

Cost Optimization focuses on avoiding un-needed costs. Key topics include understanding and controlling where money is being spent, selecting the most appropriate and right number of resource types, analyzing spend over time, and scaling to meet business needs without overspending.

The AWS Well-Architected Framework provides you with a consistent approach to evaluate your architectures and implement designs that will scale over time. The Framework helps you produce stable and efficient systems, allowing you to focus on your functional requirements.

As a member of the AWS Well-Architected Partner program, Eplexity is certified by AWS to review of your infrastructure with you and provide you the remediation services required to achieve Well-Architected status. If you’re interested in learning more, simply drop us a line, we’re here to help.

Recent Posts

Which of the following are focuses of the Reliability pillar of the well architected framework?

In this post I’m going to give an overview of the AWS Well Architected Framework, then give a deep dive on the Reliability pillar, which is one of the 5 core pillars that should underpin your AWS architecture.

Which of the following are focuses of the Reliability pillar of the well architected framework?

AWS Well Architected Framework

The AWS Well Architected Framework is a series of best practise principles, designed by AWS, to help customers compare their AWS environments against these best practises and identify areas for improvement.  The Framework is based on the extensive experience gleaned by AWS Solutions Architects over the years, in tens of thousands of AWS deployments.  The Framework guides AWS customers through a series of questions that enables customers to understand how well their architecture aligns with best practise.  The ultimate goal is to help AWS customers to build environments that are secure, high performance, resilient and efficient.

AWS offers a free to use Well Architected Tool which guides customers through the questions in relation to their specific AWS workloads, and then provides a plan on how best to architect the cloud environment using established best practises.

AWS Well Architected Framework Pillars

So let’s look at 5 Pillars of the AWS Well Architected Framework, and understand at a high level what each pillar is about.

AWS 5 Pillars

  1. Operational Excellence – The operational excellence pillar focuses on day to day operations of a customer’s AWS infrastructure, including change management, deployment automation, monitoring, responding to events and defining standardized operating models.
  2. Security – The security pillar focuses on protecting systems & data.  This includes Identity & access management, security information & event management, data confidentiality and integrity, and systems access.
  3. Reliability – The reliability pillar focuses on architecting for failure.  Rapid recovery from failure is essential for modern businesses.
  4. Performance Efficiency – The performance efficiency pillar focuses on the efficient use of IT and computing resources.
  5. Cost Optimization – The Cost Optimization pillar focuses on avoiding unnecessary expenditure on AWS resources.

AWS Well Architected – Reliability Pillar

The Reliability Pillar of the Well Architected Framework looks at how well a system can recover from infrastructure or service failures.  It also considers how a system can automatically scale to meet demand, and how disruptions such as misconfigurations or intermittent network issues can be mitigated.

The Reliability Pillar is based on 5 foundational architectural principles:

PrincipleDescription
Test Recovery Procedures It is much easier to simulate failure scenarios in the cloud - automation can be used to simulate failure, to assist with the formulation of recovery plans and procedures
Automatically recover from failure By defining business level KPIs, it is possible to monitor systems and trigger automation when a KPI threshold is breached. This enables automated recovery processes to repair or work around the failure.
Scale Horizontally Single large resources should be replaced by multiple smaller resources to minimize the impact of a failure.
Stop guessing capacity A common cause of IT system failure is insufficient capacity. In the cloud, resource utilization can be monitored and additional resources can be added and removed automatically.
Automate change management All infrastructure changes should be made via automation.

Foundations of AWS Reliability

Limit Management 

In order to build a reliable application infrastructure, it is essential to understand the potential limitations of that infrastructure, and be able to monitor when those limits are being reached, so corrective action can be taken.  Limits could be CPU or RAM capacity in an instance, network throughput on a particular connection, number of connections available to a database and so on.  EC2 instances are limited to 20 per region by default.  And there are many other service limits – the best place to find out the service limits that currently apply is AWS Trusted Advisor.  You can also use Amazon Cloudwatch to set alerts for when limits are being approached, for metrics such as EBS volume capacity, proviosioned IOPS and Network IO.

Networking

It is important to consider future growth requirements when architecting IP address based networks.  Amazon VPC – Virtual Private Cloud, enables customers to build out complex network architectures.  It is recommended to utilise private address ranges (as defined by RFC1918) for VPC CIDR blocks.  Be sure to select ranges that will not conflict with ranges in use elsewhere in your network topology.  When allocating CIDR blocks, it is important to: 

  •  Allow IP address space for multiple VPCs per region.
  • Consider connections between AWS accounts – other parts of the business may operate AWS resources in separate AWS accounts but need to interconnect with shared services.
  • Allow for subnets that span multiple availability zones within a VPC.
  • Leave unused CIDR block space within your VPC.

 
It is also important to consider how your network topology will be resilient to failure, misconfiguration, traffic spikes and DDoS attacks.
 

You need to consider how you will connect the rest of your network with your AWS resources.  Will you use VPNs?  If so, how will these terminate in your VPC, and how will you ensure that they are resilient and have sufficient throughput?  You may wish to use AWS Direct Connect – again, how will you ensure the resilience of this connection?  Perhaps you will require multiple connections back to separate locations outside of the AWS cloud.

Key AWS Services for Network topology  include: 

  • Amazon Virtual Private Cloud – for creation of subnets and IP address allocation.
  • Amazon EC2 – compute service where any required VPN appliances will run.
  • Amazon Route 53 – Amazon’s DNS service.
  • AWS Global Accelerator – a network acceleration service that directs traffic to optimal AWS network endpoints.
  • Elastic Load Balancing – Layer 7 load balancing that enables autoscaling to cope with increases and decreases in demand.
  • AWS Shield – Distributed Denial of Service mitigation provided both free of charge and with an optional additional subscription for an enhanced level of protection.

Infrastructure High Availability

First of all, you need to decide exactly what high availability means for your application.  How much downtime for scheduled and unscheduled maintenance?  And what budget do you have available to achieve the level of availability you desire?

There is a big difference in how you will approach the architecture of, say, an internal application that requires 99% availability, versus a mission critical customer facing application that requires 5 nines (99.999%) availability or higher.

If you are looking to achieve 5 nines availability, then every single component of your architecture will need to be able to achieve 5 nines availability to avoid single points of failure.  This will mean adding in a lot of redundancy to the solution, which will of course add to the cost.

5 nines availability only allows for 5 minutes of downtime per year.  This is virtually impossible to achieve without a high degree of automated deployment and automated recovery from failure – human intervention simply won’t be able to keep up.  Any changes to the environment need to be thoroughly tested in a full scale non production environment, which in itself will significantly add to the overall infrastructure cost.

The table below lists common sources of service interruption which need to be considered in any high availability design:

CategoryDescription
Hardware Failure of any hardware component, eg Storage, Server, Network
Deployment Failure of any automated or manual deployments to application code, hardware, network or configuration.
Load Saturated load on any component of the application, or of the overall infrastructure itself
Data Corrupt data accepted into the system that cannot be processed.
Expired Credentials Expiration of a certificate or credentials, eg SSL certificate expiry.
Dependency Failure of a dependent service.
Infrastructure Power supply or HVAC failure impacting hardware availability
Identifier exhaustion Exceeding available capacity, hitting throttling limits, etc

Application High Availability

There’s no point designing a 5 nines availability infrastructure if the application itself cannot achieve 5 nines availability.  Here are 4 things to consider when designing a highly available application:

  1. Fault Isolation Zones – in AWS terms this can mean architecting your application to leverage multiple Regions and Availability Zones.  Regions are geographic locations around the globe that contain 2 or more Availability Zones (AZ).  Availability Zones are physically separate datacentres within a region with isolated power, network and cooling.  So in theory no 2 availability zones should fail at the same time.
  2. Redundant Components – component redundancy starts right down at the hardware level with redundant power supplies, hard drives and network interfaces. But then it extends up the stack to the server level, eg multiple web servers, multi AZ or multi region databases and so on.
  3. Microservices – read more on our blog post about AWS Microservices.
  4. Recovery Oriented Computing – Recovery Oriented Computing, or ROC, focuses on having the right monitoring in place to detect all types of failure, and then automating recovery procedures to automatically recover from a failure.

Operational Considerations for Reliability

  1. Deployment – Deployments should be automated where possible using a deployment methodology to decrease the risk of failure, such as Blue-Green, Canary, Feature Toggles or Failure Isolation Zone deployments.
  2. Testing – Testing should be carried out to match availability goals – one of the most effective testing methods is canary testing which runs constantly and simulates customer behaviour.
  3. Monitoring and Alerting – Deep monitoring of both your infrastructure and your application is essential to meet availability goals.  You need to know the status and availability of each component of the infrastructure and application, and the overall user experience being delivered.

 So we’ve touched on a number of the different elements of the AWS Reliability pillar to get you thinking about the architecture of your AWS infrastructure and applications.  AWS have a great white paper which goes into a lot more detail and lists out some hypothetical examples to illustrate some of the concepts – you can read the whitepaper here. 
If you need any help, either in reviewing current AWS infrastructure against the AWS Well Architected Framework, or in designing highly available AWS systems, then Logicata will be pleased to help.  Our AWS Managed Services ensure continuous improvement of your application infrastructure against the Well Architected Framework.  Please reach out to us for more information.

Which of the following are focuses of the reliability pillar of the well

Reliability pillar of this Well-Architected Framework, also referred to as WAF focuses on three areas : Foundations, Change Management, and Failure Management.

Which of the following are design principles of reliability pillar of the AWS well

Design Principles Implement a strong identity foundation. Enable traceability. Apply security at all layers. Automate security best practices.

Which of the following is the pillar of well

AWS Well-Architected Framework The framework includes five pillars which include best practices, including: operational excellence, security, reliability, performance efficiency, and cost optimization.

What are the four areas of focus for the reliability pillar Questions?

There are four best practice areas for reliability in the cloud:.
Foundations..
Workload Architecture..
Change Management..
Failure Management..