analysis

The AWS October 2025 Outage: 17 Million Reports, 60 Countries, and a DORA Wake-Up Call

DORA Atlas Editorial12 min read
The AWS October 2025 Outage: 17 Million Reports, 60 Countries, and a DORA Wake-Up Call

15 Hours That Exposed the Financial System's Single Point of Failure

At approximately 12:30 UTC on October 20, 2025, Amazon Web Services experienced a cascading failure originating from an internal subsystem responsible for monitoring the health of network load balancers in its US-East-1 region in northern Virginia. Within minutes, the failure propagated across dependent services. Within hours, it became one of the largest internet outages on record.

The numbers are staggering. Downdetector registered 17 million user reports from over 60 countries — a 970% spike above baseline. The outage lasted approximately 15 hours for most services, with some customers reporting degraded performance for nearly 24 hours. And the financial services sector was at the epicenter of the impact.

Coinbase, the largest US-listed cryptocurrency exchange, suspended all crypto trading for the duration of the outage. Robinhood users discovered they could not execute equity trades during market hours. Lloyds Banking Group and Bank of Scotland locked customers out of online and mobile banking. Capital One experienced transaction processing delays affecting millions of cardholders.

This was not a theoretical risk scenario. This was the exact concentration risk event that DORA Article 29 was designed to govern — playing out in real time across the global financial system.

The Anatomy of a Cascading Cloud Failure

Understanding what happened on October 20 requires understanding how cloud infrastructure dependencies create failure cascades that transcend any single service.

Timeline of the Outage

Time (UTC) Event Financial Impact
~12:30 Internal health monitoring subsystem fails in US-East-1 None initially — latent failure
~13:00 Network load balancers begin failing health checks API gateway timeouts for AWS-hosted services
~13:30 Cascading failures across dependent services (ELB, EC2, RDS, S3) Coinbase suspends trading; Robinhood reports execution failures
~14:00 Global propagation via cross-region dependencies Lloyds/BoS mobile banking inaccessible; Capital One delays
~15:00 AWS acknowledges elevated error rates 17M+ user reports across 60+ countries
~18:00 Partial mitigation — some services restored Limited trading resumes on select platforms
~03:30 (Oct 21) Full restoration confirmed ~15 hours total; some degradation for 24h

The root cause — a malfunctioning internal monitoring subsystem — illustrates a critical lesson about cloud architecture. The failure was not in the load balancers themselves. It was in the system that monitored them. A monitoring component, designed to improve reliability, became the vector for the largest outage in AWS's recent history.

This is precisely the type of correlated failure that individual financial institutions cannot anticipate through vendor questionnaires and annual assessments. The dependency was internal to AWS, invisible to customers, and catastrophic in its failure mode.

Financial Services Impact Assessment

Institution Service Affected Duration Customer Impact
Coinbase All crypto trading ~15 hours Complete trading suspension — estimated millions in unrealized trades
Robinhood Equity trade execution ~12 hours Failed orders during active market hours
Lloyds Banking Group Online/mobile banking ~10 hours Customers locked out; payment failures
Bank of Scotland Online/mobile banking ~10 hours Customers locked out; direct debit failures
Capital One Transaction processing ~8 hours Card authorization delays; some declined transactions
Multiple fintech PSPs Payment processing 6-15 hours Settlement delays, failed real-time payments

The financial impact, while difficult to quantify precisely, ran into billions of dollars when accounting for lost trading revenue, failed transactions, customer compensation, incident response costs, and reputational damage. For context, the Azure global outage earlier in 2025 — which lasted 8 hours with 18,000+ reports — was estimated at $4.8 billion to $16 billion in total economic impact. The AWS outage was longer, broader, and affected more financial institutions.

What DORA Required — and What It Revealed

For every EU-regulated financial entity affected by the October 20 outage, DORA imposed immediate obligations. The outage functioned as a live stress test of DORA's incident management and third-party risk frameworks.

The 4-Hour Reporting Clock

Under DORA Article 19, the initial incident notification must reach the national competent authority (NCA) within 4 hours of classification — and no later than 24 hours after detection. For institutions that detected the outage by 13:00 UTC, the classification clock started immediately.

The classification criteria under Article 18 and the associated RTS leave little ambiguity: an outage affecting core banking services, payment processing, or trading execution for millions of customers across multiple jurisdictions is a major ICT-related incident. The reporting cascade follows a strict timeline:

Report Deadline Required Content
Initial notification 4 hours after classification (max 24h after detection) Nature of incident, affected services, initial impact assessment, remediation actions underway
Intermediate report 72 hours from initial notification Updated impact assessment, root cause analysis (if available), recovery timeline, customer communication summary
Final report 1 month from incident Complete root cause analysis, quantified impact, remediation actions completed, lessons learned, framework changes

For institutions without automated incident classification and reporting workflows, meeting the 4-hour deadline during a live crisis — while simultaneously managing customer communications, activating business continuity plans, and coordinating with the cloud provider — is operationally brutal.

Article 29: The Concentration Risk Reckoning

DORA Article 29 requires financial entities to assess whether their ICT third-party arrangements create concentration risk — specifically considering whether a provider is "not easily substitutable" (Art. 29(2)(a)) and whether multiple financial entities depend on the same provider (Art. 29(2)(c)).

The AWS outage answered both questions definitively.

Non-substitutability. The institutions that suffered the longest outages were those with deep integration into AWS-native services — proprietary managed databases, serverless architectures, platform-specific APIs. Their applications could not simply "fail over" to another cloud provider because they were architecturally locked in. The exit strategy that Art. 28(8) requires for critical ICT arrangements was, for many, either untested or nonexistent.

Systemic concentration. The simultaneous impact on Coinbase, Robinhood, Lloyds, Bank of Scotland, Capital One, and dozens of payment processors and fintechs demonstrated that provider concentration is not just an institutional risk — it is a systemic risk. When a single provider's failure can simultaneously degrade trading, payments, and banking across multiple jurisdictions, the concentration has crossed the threshold from risk to vulnerability.

The Concentration Risk Numbers

The AWS outage did not occur in isolation. The 12 months preceding it — August 2024 through August 2025 — saw over 100 significant outages across the three major hyperscale cloud providers combined.

Provider Significant Outages (Aug 2024 - Aug 2025) Notable Financial Services Impact
AWS 35+ October 2025 (15h, 17M reports); multiple S3/EC2 regional incidents
Microsoft Azure 30+ Front Door global outage (8h, 18K+ reports, $4.8-16B estimated)
Google Cloud 25+ Cloud SQL multi-region incident; IAM propagation delays
Combined 100+ Average: ~2 significant outages per week across all three

These three providers, combined with their corporate subsidiaries and SaaS platforms that run on their infrastructure, represent a staggering concentration of the global technology market. Research from SecurityScorecard found that just 15 companies control 62% of the global technology market — and the cloud infrastructure layer is even more concentrated.

For a mid-size European bank with 80% of its critical workloads on AWS, the Herfindahl-Hirschman Index (HHI) for cloud concentration would exceed 6,400 — well above the 2,500 threshold that indicates a "highly concentrated" market in antitrust economics. DORA does not prescribe specific HHI thresholds, but supervisory expectations are converging around quantitative measurement of concentration, and an HHI above 2,500 for critical ICT services will attract scrutiny.

The Lead Overseer Question

The AWS outage accelerated a conversation that was already underway at the European Supervisory Authorities: the designation of hyperscale cloud providers as Critical Third-Party Providers (CTPPs) under DORA Article 31.

The designation criteria are clear. Art. 31(2) considers: the number and nature of financial entities relying on the provider, the criticality of functions supported, the degree of substitutability, and the systemic impact of a large-scale failure. AWS checks every box.

Once designated, a CTPP is subject to direct oversight by a Lead Overseer — one of the three ESAs — which can conduct inspections, issue recommendations, and impose penalties of up to 1% of average daily worldwide turnover per day of non-compliance (Art. 35(8)). For AWS, with parent company Amazon's revenue exceeding $600 billion annually, that penalty ceiling is substantial.

But the Lead Overseer regime does not absolve individual institutions of their own obligations. Art. 29 concentration risk assessment, Art. 28(8) exit strategies, and Art. 11 recovery capabilities remain squarely the institution's responsibility. The Lead Overseer adds a systemic layer of supervision — it does not replace institution-level governance.

What Should Have Been in Place Before October 20

The institutions that navigated the outage with the least customer impact shared common characteristics — none of which can be implemented during a crisis.

Multi-cloud capability for critical services. Institutions that could fail over core banking or trading execution to a secondary cloud provider — even in degraded mode — maintained service continuity. This requires architectural investment measured in years, not weeks.

Tested business continuity plans. Not documented plans — tested plans. Institutions that had conducted scenario-based resilience testing (DORA Art. 24-27, Pillar III) for a "primary cloud provider multi-region outage" scenario had practiced the coordination, communication, and recovery procedures that the outage demanded.

Automated incident classification and reporting. The 4-hour reporting clock does not pause while the incident response team debates whether the event qualifies as "major." Automated classification based on predefined criteria — customer impact thresholds, service availability metrics, geographic scope — enables immediate NCA notification.

Contractual protections with teeth. Art. 30 key contractual provisions — including notification obligations, service level commitments, penalty clauses, and exit assistance — determine what support the institution can demand from the provider during and after an outage. Institutions with vague SLAs and no contractual right to post-incident root cause analysis were left waiting for AWS's public post-mortem.

Five Lessons for Every Financial Institution

First, concentration risk is not theoretical. The October 20 outage demonstrated that a single cloud provider failure can simultaneously affect trading platforms, payment processors, banks, and fintechs across 60+ countries. If your Art. 29 assessment rates cloud concentration as "medium" or "low," revisit it.

Second, exit strategies must be credible and tested. Art. 28(8) requires exit strategies for critical ICT arrangements. An untested document claiming "we could migrate to Azure in 6 months" is not an exit strategy. It is an assumption. Test it.

Third, incident reporting readiness is operational, not procedural. The 4-hour clock does not wait for manual classification. Invest in automated detection, classification, and notification workflows that activate the moment service degradation crosses defined thresholds.

Fourth, sub-outsourcing chain mapping is essential. Many institutions discovered during the outage that their "diversified" vendor portfolio was, in fact, heavily concentrated on AWS through sub-outsourcing chains. Your payment processor runs on AWS. Your fraud detection vendor runs on AWS. Your identity provider runs on AWS. Map the chain.

Fifth, the board must understand cloud concentration as a systemic risk. Art. 5 places ultimate responsibility for ICT risk management with the management body. Art. 14 requires the management body to be briefed on ICT risk and resilience. Cloud concentration at the levels exposed by October 20 is a board-level risk that demands board-level attention, investment, and governance.

The AWS October 2025 outage was not a black swan. It was a white swan — predictable, documented in risk registers, warned about by regulators, and experienced at smaller scale dozens of times in the preceding years. The question for every financial institution is not whether it will happen again. It is whether, when it does, the institution can demonstrate that it governed the risk, tested its defenses, reported within timelines, and recovered within tolerance. That is what DORA demands. That is what October 20 tested. And for too many institutions, the test revealed gaps that should have been closed before the regulation became applicable.


This analysis reflects DORA Regulation (EU) 2022/2554 and associated RTS/ITS as applicable. Outage data sourced from Downdetector, public incident reports, and financial services industry disclosures.


Share