
UPDATE Monday, 3:41 p.m. ET: Amazon indicated its AWS services were well on the way to fully recovering.
"We continue to observe recovery across all AWS services," the company wrote. It did note customers may still face "intermittent function errors" with Lambda, its serverless compute service.
AWS saw a major outage in the early hours of Monday morning, a temporary recovery, and then further issues as the East Coast neared midday. You can read the full explanation of the outages in both the original story and our regular updates to this article, but, in short, any problem with AWS means major issues for large swaths of the internet. Sites and services such as United Airlines, Snapchat, McDonald's, Verizon, Venmo, and countless others all saw spikes in user-reported issues on Downdetector.
While the internet is vast, there are a few pillars — AWS perhaps chief among them — that can lead to large, disrusptive downstream effects should they experience problems.
UPDATE Monday, 3:01 p.m. ET: Amazon said its continued efforts to remedy issues with its AWS services appeared to be working, noting it saw "decreasing networking connectivity issues," in its most recent update on its status page.
Users still reported a relatively high number of issues with AWS on Downdetector, though many third-party services apparently affected by the AWS outage appeared to be recovering.
It's been a tremendously turbulent Monday for AWS. The popular cloud platform saw a major outage in the early morning hours, briefly recovered, and then experienced new problems around midday.
(Disclosure: Downdetector is owned by Ziff Davis, the same parent company as Mashable.)
UPDATE Monday, 2:15 p.m. ET: Amazon said its efforts to fix its connectivity issues appear to be working. Its widely popular AWS cloud platform suffered renewed issues starting around midday, just hours after a major outage during the early hours of Monday morning.
The company wrote its "mitigations to resolve launch failures" were progressing and that it expected "launch errors and network connectivity issues to subside" as it worked to apply fixes more widely.
UPDATE Monday, 1:15 p.m. ET: Amazon wrote it was working to fix connectivity issues that arose midday Monday ET, hours after a major outage in the early hours of the day.
"We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services," read the latest update from the AWS status page.
Mike Chapple, an IT professor at the University of Notre Dame, said that further issues surfacing after the initial outage is not necessarily a surprising development.
"While this is disruptive, it isn't unusual. The process of fixing a serious IT infrastructure issue often creates new problems, and fixes often need to be rolled out across a large number of systems over time," Chapple said in an emailed statement to Mashable. "As engineers work to steady the system, operations slowly stabilize and things return to normal. Think of it like a utility outage that occurs in a large city. The power might flicker on and off a few times as repair crews do their work. We're seeing something similar now with AWS."
UPDATE Monday, 12:15 p.m. ET: Amazon said it was homing in on the underlying issue that caused renewed issues with AWS on Monday.
"We have narrowed down the source of the network connectivity issues that impacted AWS Services," read the latest update from the AWS status page. "The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers."
It was not yet clear when outages and issues would be fully resolved.
UPDATE Monday, 11:45 a.m. ET: Amazon confirmed AWS was experiencing more issues late Monday morning, just hours after the issue was apparently resolved. The company wrote it was investigating "the root cause for the network connectivity issues that are impacting AWS services such as DynamoDB, SQS, and Amazon Connect," in its most recent update to the AWS status page.
Meanwhile, widespread service disruptions across the internet continued. User-reported issues have spiked for a number of popular services, according to Downdetector, including FanDuel, Snapchat, Apple Music, Asana, Verizon, and many more. The renewed AWS problems appeared to be significant and once again causing problems for large numbers of users.
A service disruption at Amazon Web Services (AWS), Amazon's popular cloud hosting and data service, caused massive problems for internet users starting their workweek on Monday. Since AWS powers huge portions of the internet, the list of services and sites that suffered outages on Monday was pretty staggering.
According to user-reported issues at the site Downdetector, affected services include United Airlines, AT&T, Fortnite, Disney+, HBO Max, Signal, Snapchat, McDonald's, Verizon, Venmo, and many more. (Disclosure: Downdetector is owned by Ziff Davis, the same parent company as Mashable.) Amazon services like Prime and Alexa were affected, too. In short: Almost anyone could've been affected in some way.
Nearly everything we own is internet-connected — our fridges are WiFi-enabled billboards — meaning an AWS outage can disrupt large swaths of lives.
Nearing midday, it appeared the issue was over. But then Amazons's AWS Health Dashboard indicated problems had resurfaced.
"We have confirmed multiple AWS services experienced network connectivity issues in the US-EAST-1 Region," read an update around 10:30 a.m. ET. "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."
It appeared AWS was seeing issues again, though not on the scale of the outage in the earlier hours. Some services, such as Venmo and Boost Mobile, saw a corresponding jump in user-reported issues on Downdetector.
Amazon previously said that problem had either fully resolved or was resolving. Mashable reached out for comment and was directed to the AWS Health Dashboard. At about 6:35 a.m. ET the AWS Health Dashboard indicated the main issue was resolved, though problems may persist as things got up and running. That could, perhaps, hint at the new problems that surfaced.
"The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now," the 6:35 a.m. ET update read. "Some requests may be throttled while we work toward full resolution."
What caused the AWS outage?
The exact reason AWS initially went down remains unknown, but we have an idea. Services using AWS were unable to access DynamoDB, an Amazon-run database, because the Domain Name System (DNS) had a problem. The DNS effectively translates website names into IP addresses. So when Amazon wrote on its Health Dashboard that the DNS issue had been "fully mitigated," it's saying the real problem was fixed.
"Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data," Mike Chapple, an IT professor at University of Notre Dame, told CNN. "It's as if large portions of the internet suffered temporary amnesia."
Rafe Pilling, the director of threat intelligence at the cybersecurity firm Sophos, told The Guardian that the incident didn't appear to be a cyberattack or anything nefarious, which is aligned with Amazon's statements.
"When anything like this happens the concern that it’s a cyber incident is understandable," he told the U.K. outlet. "AWS has a far-reaching and intricate footprint, so any issue can cause a major upset."
It's likely Amazon will, at a later time, explain what happened Monday further. It's unclear how the 10:35 a.m. ET "network connectivity issues" are related, if at all, to the initial issue with the DNS, though it feels reasonable to assume issues could arise as services worked to return to normal.
Why is an AWS outage such a big deal?
In short: AWS is a central pillar of the modern internet. Without it, things crash. As major companies gobbled up market share, it actually made the infrastructure on the internet remarkably fragile — an issue with AWS, or Google, or Microsoft, or Crowdstrike means issues for tons of users.
Advocates even argue that such reliance on these big players is a free speech issue.
"We urgently need diversification in cloud computing," said Dr. Corinne Cath-Speth, head of digital human rights organization Article 19, according to The Guardian. "The infrastructure underpinning democratic discourse, independent journalism, and secure communications cannot be dependent on a handful of companies."
The long and short of it: If something goes wrong with AWS, a lot goes wrong everywhere else.