Weekly Journal 23 - AWS Outage

Synopsis

AWS has an outage that ruins my day.

AWS Outage

This past week AWS suffered an outage that impacted service across an entire availability zone (AZ) in Frankfurt, Germany. Unfortunately a portion of our primary cloud application is hosted in that AZ, so we suffered an outage too.

Without divulging too many of the details of our system, the takeaway for me is ensure an AWS-hosted system has as much of it’s functionality contained in a single AZ. Systems should span multiple AZs for redundancy and resiliency, but each AZ should be self-contained if at all possible. That way if a single AZ fails, like what happened this past week, the application will continue functioning in the other AZs and remain available to customers.

What’s Next?

I’m hoping I can get back to my Ansible playbook for setting up iptables rules.