Data

Amazon outage breaks the internet: Snapchat, Fortnite, banks go dark

A single technical failure at Amazon Web Services knocked out 142 platforms for 15 hours, affecting millions of users and costing businesses billions in lost revenue on October 19-20, 2025.

Luis Rijo

Oct 21, 2025 • 14 min read

AWS logo with frown reflects October 2025 outage that disrupted Snapchat, Fortnite, and banks globally

A DNS resolution failure in Amazon Web Services' US-EAST-1 region triggered cascading outages across 142 services between 11:49 PM PDT on October 19 and 3:01 PM PDT on October 20, 2025. According to Amazon Staff in their official statement, the incident impacted both AWS services and Amazon.com operations, along with AWS Support functions.

The technical failure demonstrated the concentrated dependencies within digital infrastructure that supports approximately 30% of global cloud computing operations. Gaming platforms Fortnite and Roblox experienced interruptions. Social applications including Snapchat faced service degradation. Financial services such as Coinbase and Robinhood reported issues affecting millions of users across multiple time zones.

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Engineers at Amazon identified the root cause at 12:26 AM PDT on October 20. According to Amazon Staff, the investigation determined that DNS resolution issues for regional DynamoDB service endpoints triggered the event. The company implemented initial mitigations by 2:24 AM PDT, though full recovery required additional hours of systematic intervention.

The incident revealed technical dependencies within AWS infrastructure. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering, but a subset of internal subsystems continued experiencing impairment. According to Amazon Staff, the internal subsystem of EC2 responsible for launching instances remained affected due to its dependency on DynamoDB.

To facilitate recovery, AWS temporarily throttled specific operations including EC2 instance launches. This decision affected downstream services relying on EC2 infrastructure. Network Load Balancer health checks became impaired, resulting in network connectivity issues across multiple services such as Lambda, DynamoDB, and CloudWatch. Engineers recovered the Network Load Balancer health checks at 9:38 AM PDT.

The throttling measures extended beyond compute services. According to Amazon Staff, the team temporarily restricted processing of SQS queues via Lambda Event Source Mappings and asynchronous Lambda invocations. These limitations aimed to prevent additional system strain while engineers worked through recovery procedures.

Follow on Google, Google News, X, LinkedIn, Mastodon, Bluesky, or via RSS

By 12:28 PM PDT, many AWS customers and services experienced significant recovery. Engineers gradually reduced throttling of EC2 new instance launch operations while addressing remaining impact. Full service restoration occurred at 3:01 PM PDT, concluding a disruption spanning more than 15 hours from initial detection to complete recovery.

Some services continued processing backlogs beyond the official resolution time. According to Amazon Staff, AWS Config, Redshift, and Connect required additional hours to work through accumulated messages. The company committed to sharing a detailed post-event summary following the incident.

The outage affected organizations across sectors. Dead by Daylight acknowledged awareness of AWS issues affecting players' ability to access the game on various platforms. Genshin Impact reported problems with Epic services, noting issues with top-up functions and login capabilities. Morning Brew reported that tracking site Downdetector received over 8 million reports around the globe during the incident.

The outage cost businesses millions in lost revenue, with estimates suggesting substantial losses for e-commerce and advertising platforms during the 15-hour disruption window. According to ParcelHero, when similar outages occurred in 2024 at CrowdStrike, Fortune 500 companies experienced $5.4 billion in losses. The October 20 incident affected over 1,000 companies globally, creating comparable economic impact across sectors.

For digital advertising operations, the disruption highlighted infrastructure concentration risks. AWS powers significant portions of advertising technology infrastructure, with companies like VideoAmp running proprietary measurement methodologies on AWS Clean Rooms for privacy-enhanced analytics. Amazon's advertising business has grown substantially, with AWS demonstrating 19% growth to $108 billion in 2024 and serving as the technical foundation for many advertising technology implementations.

The incident also affected retailers using Amazon's cloud-based advertising solutions. Macy's had announced plans to implement Amazon Retail Ad Service, built on AWS infrastructure, for sponsored product advertisements. The October 20 outage demonstrated potential vulnerabilities in such dependencies. Amazon Publisher Cloud, launched in 2023, relies on AWS Clean Rooms infrastructure to enable publishers to plan programmatic deals and activate them in Amazon DSP. During the outage, these advertising technology systems experienced degradation alongside other AWS services.

The technical nature of the failure originated within what Amazon Staff described as DNS resolution issues. Domain Name System infrastructure translates human-readable addresses into machine-readable IP addresses, forming a fundamental component of internet connectivity. When DNS resolution fails for critical service endpoints like DynamoDB, dependent systems cannot locate necessary resources, triggering cascading failures across interconnected services.

AWS maintains the US-EAST-1 Region as one of its oldest and most utilized data center locations, serving customers throughout North America and globally. The concentration of services within this region meant that a single point of failure—DNS resolution for DynamoDB endpoints—created widespread impact across unrelated applications and platforms.

The recovery process required systematic intervention across multiple layers of infrastructure. At 2:01 AM PDT, Amazon Staff identified a potential root cause related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. Engineers worked on multiple parallel paths to accelerate recovery, acknowledging that the issue affected other AWS Services in the region. Global services or features relying on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables also experienced issues.

By 3:35 AM PDT, the underlying DNS issue had been fully mitigated. According to Amazon Staff, most AWS Service operations were succeeding normally, though some requests experienced throttling while working toward full resolution. Services continued processing backlogs of events including CloudTrail and Lambda. Requests to launch new EC2 instances or services launching EC2 instances such as ECS in US-EAST-1 still experienced increased error rates.

Buy ads on PPC Land. PPC Land has standard and native ad formats via major DSPs and ad platforms like Google Ads. Via an auction CPM, you can reach industry professionals.

Learn more

The company recommended that users flush DNS caches if still experiencing issues resolving DynamoDB service endpoints in US-EAST-1. This guidance reflected the distributed nature of DNS caching across internet infrastructure, where stale or incorrect DNS records can persist in local caches even after upstream resolution.

At 4:48 AM PDT, Amazon Staff provided guidance for minimizing impact from ongoing EC2 launch issues. The company recommended EC2 Instance launches that were not targeted to specific Availability Zones, allowing EC2 flexibility in selecting appropriate AZs. The impairment in new EC2 launches affected services such as RDS, ECS, and Glue. Amazon recommended that Auto Scaling Groups be configured to use multiple AZs so that Auto Scaling could manage EC2 instance launches automatically.

Engineers pursued further mitigation steps to recover Lambda's polling delays for Event Source Mappings for SQS. According to Amazon Staff, AWS features depending on Lambda's SQS polling capabilities such as Organization policy updates experienced elevated processing times.

At 5:48 AM PDT, Amazon Staff confirmed recovery of processing SQS queues via Lambda Event Source Mappings. The team worked through processing the backlog of SQS messages in Lambda queues. This represented one milestone in the broader recovery process affecting multiple interconnected systems.

The incident generated substantial discussion across social platforms. One user noted that when internet services experience downtime, users migrate to X to determine what happened. The concentration of reports on alternative platforms during AWS outages has become a pattern, with users seeking information about service status when primary applications fail.

For marketing professionals, the incident underscored dependencies within digital advertising infrastructure. The concentration of advertising technology services on cloud platforms means that infrastructure failures can disrupt campaign delivery, measurement, and optimization across multiple channels simultaneously. Organizations building marketing technology stacks face decisions about infrastructure dependencies and contingency planning for service disruptions.

Technical architecture decisions made years earlier influenced the October 20 incident's scope and duration. According to Amazon Staff, the underlying DNS issue affected DynamoDB service endpoints specifically, but the cascading impact reached services with dependencies on DynamoDB functionality. This architectural coupling meant that a single service's DNS failure could compromise broader platform operations.

The incident affected 142 AWS services across multiple categories. Resolved services included AWS Account Management, AWS Amplify, AWS AppConfig, AWS AppSync, AWS Application Migration Service, AWS B2B Data Interchange, AWS Batch, AWS Billing Console, AWS Client VPN, AWS Cloud WAN, AWS CloudFormation, AWS CloudHSM, AWS CloudTrail, AWS CodeBuild, AWS Config, AWS Control Tower, AWS DataSync, AWS Database Migration Service, AWS Deadline Cloud, and AWS Direct Connect.

Additional affected services encompassed AWS Directory Service, AWS Elastic Beanstalk, AWS Elastic Disaster Recovery, AWS Elastic VMWare Service, AWS Elemental, AWS End User Messaging, AWS Firewall Manager, AWS Global Accelerator, AWS Glue, AWS HealthImaging, AWS HealthLake, AWS HealthOmics, AWS IAM Identity Center, AWS Identity and Access Management, AWS IoT Analytics, AWS IoT Core, AWS IoT Device Management, AWS IoT Events, AWS IoT FleetWise, AWS IoT Greengrass, and AWS IoT SiteWise.

The comprehensive list extended to AWS Lake Formation, AWS Lambda, AWS Launch Wizard, AWS License Manager, AWS NAT Gateway, AWS Network Firewall, AWS Organizations, AWS Outposts, AWS Parallel Computing Service, AWS Partner Central, AWS Payment Cryptography, AWS Private Certificate Authority, AWS Resource Groups, AWS Secrets Manager, AWS Security Incident Response, AWS Security Token Service, AWS Site-to-Site VPN, AWS Step Functions, AWS Storage Gateway, AWS Support API, AWS Support Center, and AWS Systems Manager.

Further affected services included AWS Systems Manager for SAP, AWS Transfer Family, AWS Transform, AWS Transit Gateway, AWS VPCE PrivateLink, AWS Verified Access, AWS WAF, AWS WickrGov, Amazon API Gateway, Amazon AppFlow, Amazon AppStream 2.0, Amazon Athena, Amazon Aurora DSQL Service, Amazon Bedrock, Amazon Chime, Amazon CloudFront, Amazon CloudWatch, Amazon CloudWatch Application Insights, Amazon Cognito, Amazon Comprehend, Amazon Connect, Amazon DataZone, Amazon DocumentDB, Amazon DynamoDB, Amazon EC2 Instance Connect, and Amazon EMR Serverless.

The outage also impacted Amazon ElastiCache, Amazon Elastic Compute Cloud, Amazon Elastic Container Registry, Amazon Elastic Container Service, Amazon Elastic File System, Amazon Elastic Kubernetes Service, Amazon Elastic Load Balancing, Amazon Elastic MapReduce, Amazon EventBridge, Amazon EventBridge Scheduler, Amazon FSx, Amazon GameLift Servers, Amazon GameLift Streams, Amazon GuardDuty, Amazon Interactive Video Service, Amazon Kendra, Amazon Kinesis Data Streams, Amazon Kinesis Firehose, and Amazon Kinesis Video Streams.

Additional services experiencing issues included Amazon Location Service, Amazon MQ, Amazon Managed Grafana, Amazon Managed Service for Apache Flink, Amazon Managed Service for Prometheus, Amazon Managed Streaming for Apache Kafka, Amazon Managed Workflows for Apache Airflow, Amazon Neptune, Amazon OpenSearch Service, Amazon Pinpoint, Amazon Polly, Amazon Q Business, Amazon Quick Suite, Amazon Redshift, Amazon Rekognition, Amazon Relational Database Service, Amazon SageMaker, Amazon Security Lake, Amazon Simple Email Service, Amazon Simple Notification Service, Amazon Simple Queue Service, Amazon Simple Storage Service, Amazon Simple Workflow Service, Amazon Textract, Amazon Timestream, Amazon Transcribe, Amazon Translate, Amazon VPC IP Address Manager, Amazon VPC Lattice, Amazon WorkMail, Amazon WorkSpaces, Amazon WorkSpaces Thin Client, EC2 Image Builder, and Traffic Mirroring.

The breadth of affected services demonstrated the interconnected nature of cloud infrastructure. Services spanning compute, storage, networking, database, analytics, machine learning, security, and application development all experienced degradation or failure during the incident window.

For customers using AWS services, the incident highlighted the importance of multi-region architecture and failover capabilities. Organizations relying on single-region deployments faced complete service unavailability during the outage window. Those with multi-region architectures could potentially route traffic to unaffected regions, though many services maintain dependencies on US-EAST-1 for global operations.

The recovery timeline demonstrated the complexity of restoring interconnected cloud infrastructure. Initial DNS mitigation at 2:24 AM PDT did not immediately restore full functionality. Engineers spent subsequent hours addressing cascading effects throughout dependent systems. The gradual reduction of throttling mechanisms and systematic restoration of individual service components extended recovery well into the afternoon PDT.

The incident generated over 6.5 million user reports globally to monitoring platform Downdetector, according to The Guardian. This volume represented one of the largest reporting events for a cloud infrastructure outage, demonstrating how deeply AWS dependencies had penetrated digital services. Reports peaked around 7:50 AM ET, with concentrations in North America and Europe where business operations faced maximum disruption.

Banks across the United Kingdom experienced service degradation. Lloyds Banking Group, Bank of Scotland, and Halifax reported issues affecting customer access to online banking services. Airlines including United and Delta experienced system disruptions, with United implementing backup systems to address technology disruptions affecting its app, website, and internal systems.

Universities reported cascading failures across educational technology platforms. Rutgers University documented impacts to Canvas, Kaltura, Smartsheet, Adobe Creative Cloud, Cisco Secure Endpoint, and ArcGIS. The educational sector's heavy reliance on cloud-based learning management systems meant that instruction, assignments, and student communications faced interruption during peak academic hours.

Financial services platforms provided urgent communications to users concerned about fund safety. Coinbase informed customers that "All funds are safe" despite service unavailability. The cryptocurrency exchange's message reflected broader anxieties about financial system reliability when underlying infrastructure fails. Similar concerns affected payment platforms including Venmo, where users expressed frustration about inability to access funds.

Design and productivity tools experienced widespread disruption. Canva reported significantly increased error rates impacting functionality, attributing issues to "a major issue with our underlying cloud provider." The graphic design platform serves millions of users across business, education, and creative sectors. Service restoration occurred gradually throughout the day, with full access returning for most users by evening.

Artificial intelligence services built on AWS infrastructure faced outages. Perplexity CEO Aravind Srinivas confirmed on X that "The root cause is an AWS issue. We're working on resolving it." The incident demonstrated how emerging AI applications built on cloud infrastructure inherit the reliability characteristics—and vulnerabilities—of their hosting platforms.

Internal Amazon operations experienced disruption alongside external customers. According to Reddit reports from Amazon employees, warehouse and delivery operations faced system unavailability at many sites. Workers received instructions to stand by in break rooms and loading areas during shifts. The Anytime Pay app, allowing employees immediate access to earned wages, went offline. Seller Central, the hub used by Amazon's third-party sellers to manage businesses, also experienced outages.

The incident sparked discussions about cloud infrastructure concentration. Betsy Cooper, director of the Aspen Institute's Policy Academy, noted that while large cloud providers offer strong cybersecurity protections and convenience, the downside emerges when issues occur. "We all have an incentive to use the big companies, because they're so ubiquitous and it's easier for us to access all of our data in one place," according to Cooper's comments to NPR. "That's great until something goes wrong, and then you really see just how dependent you are on a handful of those companies."

Mike Chapple, IT professor at the University of Notre Dame's Mendoza College of Business and former National Security Agency computer scientist, explained the technical nature of the failure. "DynamoDB isn't a term that most consumers know," according to Chapple's statement. "However, it is one of the record-keepers of the modern Internet." Chapple noted that early reports indicated the problem wasn't with the database itself, with data appearing safe. "Instead, something went wrong with the records that tell other systems where to find their data."

The outage drew comparisons to the July 2024 CrowdStrike incident, when a faulty software upgrade caused Microsoft Windows systems to go dark globally. That event grounded thousands of flights and affected hospitals and banks, revealing fragility in global technology infrastructure. The October 20 AWS incident, while stemming from different technical causes, demonstrated similar patterns of cascading failure across interconnected systems.

Some platforms used the outage for competitive positioning. Elon Musk promoted X's stability during the incident, responding "Not us" to posts showing affected services. Musk emphasized X's lack of "AWS dependencies" and promoted the platform's encrypted messaging capabilities as alternatives to affected services like Signal.

Economic analysts estimated millions in lost productivity and revenue. E-commerce delays, trading disruptions, and app failures created measurable financial impact across sectors. Small businesses and creators reliant on cloud-based tools expressed frustration over disrupted workflows. Gamers and streamers reported lost progress and entertainment access during peak usage hours.

Amazon committed to producing a detailed post-event summary following the incident. Such summaries typically provide technical analysis of root causes, contributing factors, and implemented changes to prevent recurrence. The industry awaits this documentation to understand the specific technical failures and architectural decisions that contributed to the widespread impact.

The incident occurred during a period when cloud infrastructure reliability has become increasingly critical to digital advertising operations. Amazon DSP's October 2025 integration with Microsoft Monetize as a preferred partner highlighted the programmatic advertising ecosystem's dependence on AWS infrastructure for real-time bidding and ad delivery.

Organizations evaluating cloud dependencies may reassess concentration risks following the October 20 incident. While cloud platforms provide scalability and operational efficiency, single-provider dependencies create potential single points of failure. The DNS issue affecting US-EAST-1 demonstrated how infrastructure-level failures can cascade across seemingly independent applications and services.

AWS maintains approximately 30% of the global cloud computing market, according to Synergy Research Group. This market concentration means that AWS infrastructure supports a substantial portion of internet services, from e-commerce platforms to financial services, entertainment applications, and enterprise software. Microsoft Azure and Google Cloud represent other major providers, but the market remains concentrated among these three companies.

The US-EAST-1 region's age and size contributed to the outage's impact. As AWS's original and largest web services location, the Virginia data center hosts legacy systems and serves as a primary hub for many organizations. The concentration of critical services within this single region meant that a DNS resolution failure affecting DynamoDB endpoints could compromise operations for thousands of unrelated applications.

Multi-cloud strategies present theoretical solutions to concentration risks, but implementation challenges limit adoption. According to industry discussions following the outage, cost and complexity of maintaining parallel infrastructure across multiple cloud providers remain prohibitive for many organizations, particularly smaller businesses. Hybrid cloud approaches offer partial mitigation, but require significant architectural planning and operational overhead.

The incident occurred during a period of rapid cloud infrastructure expansion. Amazon announced ongoing AWS investments to support artificial intelligence model deployment and expanded developer access. The company's focus on AI infrastructure—including the Nova family of foundation models available through AWS—increases the potential impact of infrastructure failures on emerging AI-powered services and applications.

The incident's resolution required coordination across multiple engineering teams addressing different aspects of the infrastructure stack. According to Amazon Staff updates throughout the day, teams worked in parallel on DNS resolution, network connectivity, compute instance launches, and service-specific recovery procedures. This coordinated response reflected the organizational complexity required to manage large-scale cloud infrastructure.

Customer experience during the outage varied based on specific service dependencies and architectural decisions. Organizations using services heavily dependent on DynamoDB experienced immediate impact. Those relying primarily on compute services may have experienced delayed effects as EC2 launch throttling took effect. The cascading nature of the failure meant that impact manifested differently across use cases and configurations.

The incident generated millions of user reports to monitoring platforms. According to Morning Brew, Downdetector received over 8 million reports globally, indicating widespread user-perceived impact beyond the technical metrics reported by Amazon. This volume of reports demonstrated the extent to which modern digital services depend on AWS infrastructure.

Some organizations acknowledged the AWS dependency publicly during the incident. Gaming companies posted status updates referencing AWS issues. Financial services platforms noted connectivity problems. The transparency from affected organizations contrasted with earlier eras when infrastructure dependencies often remained opaque to end users.

The incident's impact on advertising technology operations remained difficult to quantify in real-time. Campaign delivery interruptions, measurement gaps, and bidding failures likely occurred across platforms built on AWS infrastructure. The extent of advertising delivery impact during the 15-hour disruption window represents a data point for industry discussions about infrastructure reliability requirements for mission-critical marketing operations.

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Timeline

October 19, 11:49 PM PDT: AWS begins experiencing increased error rates for services in US-EAST-1 Region
October 20, 12:11 AM PDT: AWS begins investigating increased error rates and latencies for multiple services
October 20, 12:26 AM PDT: Engineers identify DNS resolution issues for regional DynamoDB service endpoints as root cause
October 20, 12:51 AM PDT: AWS confirms increased error rates affecting multiple services and Support Case creation
October 20, 2:01 AM PDT: AWS identifies potential root cause related to DNS resolution of DynamoDB API endpoint
October 20, 2:24 AM PDT: Initial DNS mitigation applied, services begin recovering
October 20, 2:27 AM PDT: Significant signs of recovery observed across most affected services
October 20, 3:35 AM PDT: Underlying DNS issue fully mitigated, most operations succeeding normally
October 20, 5:48 AM PDT: Recovery of SQS queue processing via Lambda Event Source Mappings confirmed
October 20, 9:38 AM PDT: Network Load Balancer health checks recovered
October 20, 12:28 PM PDT: Many AWS customers and services seeing significant recovery
October 20, 3:01 PM PDT: All AWS services return to normal operations
April 2025: Amazon CEO justifies AI investments in shareholder letter, AWS growth reaches $108 billion
July 2025: VideoAmp expands AWS partnership for privacy-enhanced measurement
September 2025: Macy's announces Amazon partnership for retail media using AWS infrastructure
October 2023: Amazon launches Publisher Cloud on AWS Clean Rooms for programmatic deals

Subscribe PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.

Summary

Who: Amazon Web Services experienced the outage, affecting millions of users across platforms including Snapchat, Fortnite, Roblox, Coinbase, Robinhood, and numerous AWS customers. AWS engineers worked to resolve the incident while Amazon Staff provided status communications.

What: A DNS resolution failure for regional DynamoDB service endpoints in the US-EAST-1 Region triggered cascading outages across 142 AWS services and dependent applications. The incident caused increased error rates, latencies, and service unavailability across compute, storage, networking, database, and application services. Recovery required systematic mitigation of DNS issues, throttling of operations, and gradual restoration of service functionality.

When: The incident began at 11:49 PM PDT on October 19, 2025, with root cause identification at 12:26 AM PDT on October 20. Initial DNS mitigation occurred at 2:24 AM PDT, with full service restoration at 3:01 PM PDT on October 20, spanning more than 15 hours from detection to complete recovery.

Where: The failure occurred in Amazon Web Services' US-EAST-1 Region, one of AWS's primary data center locations serving customers throughout North America and globally. The outage affected services dependent on US-EAST-1 infrastructure, including global services with US-EAST-1 dependencies.

Why: DNS resolution issues for regional DynamoDB service endpoints prevented dependent systems from locating necessary resources. The architectural coupling between services meant that a single service's DNS failure compromised broader platform operations. Recovery complexity stemmed from cascading effects throughout interconnected infrastructure, requiring systematic restoration of individual components while managing throttling to prevent additional system strain.