The story so far: On Monday (October 20, 2025), one of Amazon Web Services (AWS) data centre, located in the east coast of the U.S., began experiencing “increased error rates and latencies.” By next morning several issues were reported across AWS data centres, many of which are located in Northern Virginia. The affected data centre cluster, US-East-1, is AWS’ biggest and most active. The disruption took down digital services of over 2,000 companies, according to Downdetector. Services of Amazon, Snapchat, Signal, ChatGPT, Perplexity, Canva, Roblox, Duolingo, Fortnite, Coinbase and Epic Games online services were affected. The issue was eventually “fully resolved” by 6:53 PM ET, AWS said in a blog post.
What was the reason behind the incident?
In a status update, AWS informed users that the outage originated from a DNS error affecting the DynamoDB database application programming interfaces in the US-East-1 region. After some time, Amazon advised users experiencing issues resolving the DynamoDB service endpoints in US-EAST-1 to flush their DNS caches.
The Domain Name System (DNS) essentially functions as a lookup mechanism that translates a web URL into its corresponding IP address. For instance, the IP address for the domain www.instagram.com is 57.144.150.34. When DNS fails, there’s no way to retrieve the address, so even if a service appears to be operational for users running the platform, there’s no incoming traffic. Consider trying to contact a friend whose phone number keeps changing, and you don’t have any of these numbers memorized.
The DynamoDB database service was used by AWS to store data for Amazon’s own services as well as several other customers including Amazon retail, Amazon Alexa, Lyft, Snapchat and Signal etc.
DNS errors have become frequent due to various factors. It could happen in case a company failed to renew registration or during a faulty automated update or in the most common instances where the DNS service provider is changed or updating the domain address.
Although DNS errors could be caused by malicious actors, this specific outage was not. It was a case of “availability,” cybersecurity experts said, where the system couldn’t correctly decide which server to latch on to, which led to a ripple effect.
Despite how centralised cloud systems have become over time, AWS’ US-East-1 region is relatively infamous for causing massive disruptions. Built in 2006, US-East-1 is also the default for swathes of services and users given it was the first AWS region.
There are some global AWS services run from US-East-1 depending on its endpoints, which includes DynamoDB Global Tables among others. This could mean that services used in even Europe have infrastructure features located in US-East-1 and can be susceptible to a chain reaction.
Have there been past AWS outages?
After two prominent outages in September and December of 2021, customers were furious, calling US-East-1 a “systemic risk.” The outage in December is often cited as the most severe in AWS history. It laid bare how fragile cloud dependencies could be. Lasting almost seven hours and costing S&P companies $150 million, the outage was caused to a typo in a command entered while debugging an issue.
Following the backlash, Amazon confirmed that new architectures were being rebuilt for both the AWS support case management system for customers to chat with technical support and the health dashboard. But, in June 2023, the region faced another outage taking down 100 services for around four hours with customers grappling with a recurring issue of not being able to reach out to AWS Support quickly, despite the changes.
Notably, every AWS region has a minimum of three different availability zones (AZ) that are all connected separately. And while customers were suggested to allow their apps and platforms to run in different availability zones to control risks of impact, the entire region has been prone to going down taking everything with it.
Will there be more outages like this in future?
Some experts have predicted that cloud outages could increase due to the push towards introducing AI capabilities in enterprises that lead to an increased ingestion of data into the cloud, increasing the load on hyperscalers. (Global cloud computing is largely dominated by AWS, followed by Microsoft’s Azure and Google Cloud.)
As most experts have been warning repeatedly, the outage is a stark reminder that the over-reliance on these dominant cloud providers could bring global online companies down to their knees.
Gergely Orosz, author of the newsletter, ‘The Pragmatic Engineer,’ listed some unlikely cases of fallout from this outage, which included Postman, an API development tool, and Eight Sleep, a sleep fitness company.
“In both cases, things should have worked locally! It’s what customers assumed, and it should have been possible. But clearly the dev teams found it simpler to take on a cloud dependency – and made no prep for an AWS region outage. So now customers know these are cloud products,” he said in a X post.
Amazon released a report with a detailed summary of the 15-hour outage saying it will be temporarily disabling DynamoDB DNS Planner and that it will add “additional protections” to prevent DNS errors in future. The AWS team will also work on improving their internal testing to look for more such problems that stretched the outage.
Published – October 24, 2025 06:45 pm IST
