Noah Berger/Getty Images for Amazon Web Services
- A major AWS outage appeared to impact many online services, including Amazon, Snapchat, Venmo, Reddit, and Perplexity.
- AWS said it had mitigated the underlying issue and its services were showing "significant signs of recovery."
- The issue was marked as fully resolved at around 6 p.m. ET, but outage during daytime came in waves.
Americans ran into issues accessing many online services early Monday afternoon as Amazon worked to mitigate a major Amazon Web Service outage.
The AWS outage brought down major online services in the early hours of the morning, including Amazon, Snapchat, Signal, and Perplexity.
The issue was marked as fully resolved at around 6 p.m. ET, but outage during daytime came in waves.
A status page for Amazon's cloud unit showed more than 100 of its own services were affected at the outage's peak Monday morning.
The company said the underlying issue had been "fully mitigated" and that most AWS service operations were "succeeding normally" at 6:35 a.m. ET, but a fresh wave of outage reports spiked in the US a few hours later on Monday morning on outage-tracking website DownDetector.
At 10:14 a.m. ET, AWS reported "significant API errors and connectivity issues across multiple services in the US-EAST-1 Region," with a severity status on the AWS status page of "degraded."
DownDetector
Reports on Downdetector trended up for Amazon, Venmo, and Pinterest in the morning but begin to slowly decline in the afternoon as Amazon worked to fix the outage.
Many other online services that use AWS' cloud services and infrastructure, including Zoom, Strava, and Amazon's Alexa assistant, appeared to experience outages early Monday morning, according to Downdetector.
Among other services that showed issues on Downdetector earlier on Monday were financial service providers Venmo and Robinhood; airlines including United and Delta; and telecoms giants AT&T and Verizon. User reports also indicated problems with workplace tools, including Slack, Microsoft Teams, and Asana.
Aravind Srinivas, the CEO of AI startup Perplexity, said in an X post at 3:22 a.m. ET that its service is down. "The root cause is an AWS issue," he said. "We're working on resolving it."
A United spokesperson told Business Insider that the AWS outage disrupted access to its app and website overnight, and that the airline implemented backup systems to "end the technology disruption."
Robinhood said in a post on X that its services are "back online and recovering," while a Snapchat spokesperson told Business Insider the company is aware that some users are experiencing issues with the app and advised them to "hang tight" while it investigates.
T-Mobile was listed as showing issues on Downdetector but a company spokesperson told Business Insider that it didn't experience an outage on its own service, and that its customers "had issues when trying to use other sites or services due to a third party's outage early this morning."
An Amazon spokesperson directed Business Insider to its service status page.
What we know so far
On Monday morning, AWS's status page showed that DynamoDB, its database service underpinning many online applications, was experiencing "significant error rates" for requests to its data centers located on the US East Coast.
The issue stemmed from a problem with DNS, the company said, which translates website names to IP addresses and is often described as a phone book for the internet.
The company's status page first reported that it was investigating the issue at 3:11 a.m. ET on Monday.
At 12:13 p.m. ET, Amazon reported progress had been made.
"We have taken additional mitigation steps to aid the recovery of the underlying internal subsystem responsible for monitoring the health of our network load balancers and are now seeing connectivity and API recovery for AWS services," the company said.
At 11:43 a.m. ET, AWS said that it had "narrowed down the source of the network connectivity issues that impacted AWS Services," and that the "root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers."
As of 1:38 p.m. ET, the company said that mitigation efforts were "progressing" with some internal systems "now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region."
At around 5:48 p.m. ET, around half of the affected services had been restored, and the severity level of the outage has been updated from "degraded" to "impacted."
As of 6:53 p.m. ET, all 142 services that were down in the morning are running again, and Amazon marked the issue as "resolved."
"Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered," the company added. "By 3:01 p.m. (PT), all AWS services returned to normal operations."
Another online outage
It's not the first time an outage at one service provider has brought down large chunks of the internet.
In July last year, a faulty software update from cybersecurity company CrowdStrike caused computers around the world to crash, sparking chaos for airlines, hospitals, banks, and businesses.
There have also been notable online service outages in 2022, 2021, 2020, and 2019 — typically stemming from faulty updates or misconfigurations at one underlying service provider.
"Today's outage is another reminder that the digital world doesn't stop at borders — a local fault can ripple worldwide in minutes," said Charlotte Wilson, head of enterprise at Check Point Software, a cybersecurity company. "We've built convenience on shared systems, but resilience still depends on people and process."