AWS Reliability Pillar

Last Updated : 20-Dec-2020

Design Principles

  • Automatically recover from failure – use KPIs to trigger automatic system recovery
  • Test automatic recovery – validate recovery procedures
  • Scale horizontally to increase aggregate workload availability – use autoscaling
  • Stop guessing capacity – monitor demand and utilization to trigger scaling in or out
  • Manage change in automation – automate all changes to infrastructure for reliable recovery

Best Practices

  • Foundations – consider service quotas and network capacity
  • Workload architecture – design failure prevention and failure mitigation
  • Change management – design for changes in demand and capacity with monitoring and triggering in response to KPI changes
  • Failure Management – failure detection and automatic repair, backup and recovery, DR planning and testing


  • AutoScaling
  • AWS Backup
  • AWS Cloudwatch

Using Template: Template Post
magnifier linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram