The Internet is broken

In today's fast-paced world, we often take things for granted. As always, we only realize how important something is when it is no longer available. A major AWS outage in the US-EAST-1 region shows how dependent we still are on specific service providers and highlights the need to pursue decentralization even more.

The morning of October 20th here in Europe started off rather strangely in terms of service reliability. First, various Apple services began experiencing issues, and shortly afterwards, I noticed orange flashing LEDs on my HPE Instant On access-points. While WiFy and connectivity were still operating, the management portal could not be accessed – the last time this happened was during an AWS outage a few years ago. While the Reddit community was already discussing this malfunction, other outages, including those affecting Prime Video, Perplexity, Amazon itself, Hulu, Snapchat and even Signal (which also relies on AWS services), reinforced the impression that something serious had happened in the AWS US-EAST-1 region. As the problem spread, so did the news of the impact as well as additional information on the official status page.

In summary, we were told that a DNS error, which is relatively simple, caused these outages of between one third and one half of the global Internet. This shows our complete dependence on not only US-based services, which is a problem in itself, but also on centralized services or missing redundancy. In my case, I love self-hosting, and my authentik instances for Single Sign-On or my hosted instances of Mastodon and Vernissage at home continued to run without any problems thanks to a redundant Internet connection and no dependency on AWS’ name servers. My proxy server and Matrix ESS-Kubernetes environment, which are hosted by an excellent German provider, also continued to work without any issues, even while my beloved Signal was struck down.

Without wanting to sound smug, I realized that moving special services away from „Big Tech“ (no matter how uncomfortable this may be) was the right decision, either in terms of self-hosting or simply avoiding putting all those eggs in this one and only basket. With great power comes great responsibility, so of course I’ll have to deal with the working infrastructure by myself (and less tech-savvy people can’t do so!) – but this only applies to my personal services, and if they fail, the consequences are manageable. But what about in the enterprise? If you need to run a company and are dependent on services to keep your business running, you must either take care of geo-redundancy or deploy your services in a multi-cloud environment just to make sure than one part of it can break off without harming the overall experience. Let's not forget that all this happened because of a DNS-issue in the US-EAST-1 region of the world's biggest hyperscaler, and most of the effects were a proof of failure by design.

The Internet is broken these days but we still have the chance to fix it. Use the tools that providers give you and adopt services that can be run in more than one location. Consider running services in a decentralized way or using software components that are independent of the provider that goes down in flames at a specific moment. Of course, we have to distinguish between personal, private and corporate needs but the dependency is there and it was unpleasant to see on which services we had built our infrastructure and how big the impact was when they fail. Today it was just your online streaming service and the funny platform making it possible to send snaps that seemingly disappear after viewing – maybe tomorrow it will be the systems that take care of getting our monkey or simply keep us alive?

Let’s learn from this one!

#GoEuropean #AWSDown #DigitalSovereignty #AWS #Outage