in

The massive AWS outage that broke half the internet is finally over – here’s what happened

picture alliance / Contributor / Getty Images

Follow ZDNET: Add us as a preferred source<!–> on Google.


ZDNET’s key takeaways

  • A major AWS outage disrupted global websites, apps, and services.
  • The issue stemmed from a DNS failure in AWS’s US-East-1 region.
  • In the latest update, Amazon said the AWS outage was resolved.

Amazon Web Services (AWS)–>, the backbone of much of the internet, went dark early Monday morning. At approximately 12:11 a.m. ET on Oct. 20, it suffered a major outage<!–>, knocking out numerous websites, apps, and online platforms worldwide.

The disruption originated in the company’s critical US-East-1 region in Northern Virginia, AWS’s largest and most essential data hub. It took until 6:53 p.m. ET before the major issues were finally repaired. Even then, some downstream problems lingered.

Widespread slowdowns and timeouts

AWS first acknowledged the issue after it detected increased error rates and latency across numerous key services, including EC2, Lambda, and DynamoDB – Amazon’s cloud database technology. Engineers later identified a Domain Name System (DNS) resolution problem affecting the DynamoDB API endpoint, which cascaded across dependent systems.

Also: Europe’s plan to ditch US tech giants is built on open source – and it’s gaining steam

Yes, that’s right. The old techie joke – “Whenever there’s a network problem, it’s always DNS” – proved true yet again.

While engineers quickly fixed the DNS issue, other AWS services began to fail in its wake, leaving the platform still impaired. The next major issue emerged when AWS Network Load Balancer health checks started breaking, triggering other services to falter. As the outage spread, AWS’s service health dashboard confirmed that 28 different AWS services were impacted, causing widespread slowdowns and timeouts across cloud operations.

The effects rippled across critical sectors, knocking out access to major consumer platforms such as Snapchat, Ring, Alexa, Roblox, and Hulu, as well as financial and AI services like Coinbase, Robinhood, and Perplexity. Even Amazon.com and Prime Video experienced partial outages.

In the UK and the EU, major banks, including Lloyds Banking Group, and some government sites were reported down as the disruption extended beyond North America.

Also: The best cloud storage services: Expert tested

According to DownForEveryoneOrJustForMe, thousands of users began reporting issues just after 3 a.m. ET, with more than 14,000 outage reports logged for Amazon alone by midmorning. Smart home systems relying on AWS, such as Ring doorbells and Alexa-enabled devices, ceased functioning or lost connectivity, highlighting the deep dependency many households and companies have on Amazon’s cloud.

Data from Downdetector, a Ziff Davis-owned company, also showed the massive scope of the AWS outage. In the first two hours, more than 1 million reports came from the US, followed by 400,000 from the UK. By midmorning, total global reports had surged past 8.1 million, with 1.9 million from the US and 1 million from the UK.

Also: Where the cloud goes from here: 8 trends to follow and what it could all cost

Needless to say, social media was filled with user complaints and speculation as outages cascaded into retail, streaming, gaming, and financial operations worldwide. It turned out we weren’t happy without our internet. Who knew?

–>

Mitigated but slow to recover

AWS engineers initially said they were “working on multiple parallel paths to accelerate recovery,” focusing their investigation on network gateway errors in the US East Coast region.

Amazon later reported<!–> that the outage had been resolved by 6:35 a.m. ET, though services like Ring and Chime were still slow to bounce back. By 1:03 p.m. ET on Monday, however, AWS had not yet fully recovered.

“We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services,” the company said. “Lambda is experiencing function invocation errors because an internal subsystem was impacted by the network load balancer health checks. We are taking steps to recover this internal Lambda system. For EC2 launch instance failures, we are in the process of validating a fix and will deploy to the first AZ as soon as we have confidence we can do so safely.”

Downdetector said it had logged more than 6.5 million reports across over 1,000 dependent services by 12:30 a.m. BST. Its data showed that more than 2,000 companies experienced disruptions, with about 280 still affected as of late morning.

Also: Slow internet at home? 3 things I always inspect first to get faster Wi-Fi speeds

Luke Kehoe, an industry analyst at Ookla, said the synchronized pattern across hundreds of services indicated “a core cloud incident rather than isolated app outages.” He said the event underscored the importance of resilience and recommended that organizations distribute workloads across multiple regions to reduce the impact of future outages.

Daniel Ramirez, Downdetector by Ookla’s director of product, added that such large-scale outages were rare but might be occurring more often as companies increasingly centralized critical data and operations on a single cloud provider.

“This kind of outage, where a foundational internet service brings down a large swath of online services, only happens a handful of times in a year,” Ramirez said. “They probably are becoming slightly more frequent as companies are encouraged to completely rely on cloud services and their data architectures are designed to make the most out of a particular cloud platform.”

Marijus Briedis, NordVPN’s CTO, commented, “Outages like this highlight a serious issue with how some of the world’s biggest companies often rely on the same digital infrastructure, meaning that when one domino falls, they all do.”

And that certainly proved to be the case this time.

For users still experiencing issues resolving the DynamoDB service endpoints in US-East-1, Amazon recommended flushing DNS caches. “The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now,” Amazon said. “Some requests may be throttled while we work toward full resolution.”

Also: Bad Wi-Fi at home? Try my 10 go-to ways to fix it this weekend

Amazon is expected to share a detailed postmortem–> explaining what went wrong in the coming days.

Get the morning’s top stories in your inbox each day with our Tech Today newsletter.


Source: Robotics - zdnet.com

The best business internet providers of 2025

I found the ultimate wireless charger for Apple fans, and it’s great for traveling