analysis

Cloud Outage Frequency: 100+ Incidents in 12 Months and What DORA Demands

DORA Atlas Editorial10 min read
Cloud Outage Frequency: 100+ Incidents in 12 Months and What DORA Demands

The Outage Drumbeat

Financial services technology teams know the rhythm: a status page turning orange, a flood of Downdetector reports, a Slack channel erupting with alerts. Between August 2024 and August 2025, the three hyperscale cloud providers that host the majority of European financial sector workloads — AWS, Azure, and Google Cloud — collectively experienced more than 100 documented service outages.

This is not a statistic about minor API degradation. The DORA regulation was built for exactly this reality. The count includes outages that affected customer-facing financial services, disrupted payment processing, took down trading platforms, and generated millions of user reports across dozens of countries. The frequency tells a story that DORA's framers anticipated: the cloud infrastructure on which the financial sector depends is not as resilient as its marketing suggests, and the concentration of critical workloads on a small number of providers creates correlated failure risk that individual institutions cannot manage alone.

The Data: 12 Months of Cloud Disruption

AWS: The October 2025 Event and Beyond

The AWS outage in October 2025 was the most impactful cloud event of the year. Over 15 hours, the disruption generated approximately 17 million user reports across more than 60 countries. Financial services institutions — including Coinbase, Robinhood, and Capital One — experienced service disruptions ranging from degraded performance to complete unavailability of customer-facing applications.

The October event was not an isolated incident. AWS experienced multiple service-level outages during the 12-month period, affecting compute (EC2), storage (S3), database (RDS), and networking (Route 53) services at various points. The pattern reveals that cloud infrastructure operates on a continuum of partial degradation, not a binary of "available" or "down."

Azure: The $4.8-16 Billion Global Outage

Azure's most significant event during the period was an 8-hour global outage that affected services across all major regions. Industry estimates placed the economic impact between $4.8 billion and $16 billion — a range that reflects the difficulty of quantifying cascading impacts across dependent services and downstream financial operations.

The Azure outage hit financial institutions particularly hard because of the platform's deep integration with Microsoft 365 (email, collaboration, authentication) and Dynamics (ERP, CRM). When Azure's identity services experienced degradation, institutions that relied on Azure Active Directory for authentication found themselves locked out of their own systems.

Google Cloud: Service-Specific Disruptions

Google Cloud's outages during the period were more service-specific: GKE (container orchestration), Compute Engine, and Cloud Networking experienced separate incidents that affected financial sector workloads deployed on the platform. While Google Cloud's financial sector market share is smaller than AWS or Azure in Europe, the institutions that depend on it — particularly in analytics, machine learning, and data processing — experienced material disruptions.

Provider Notable incidents (Aug 2024-Aug 2025) Peak impact Financial sector entities affected
AWS October 2025: 15h, 17M reports, 60 countries Global compute/storage/network degradation Coinbase, Robinhood, Capital One, others
Azure 8h global outage, identity services $4.8-16B estimated economic impact Barclays, Lloyds, European banks via M365/AD
Google Cloud GKE, Compute Engine incidents Regional service disruption Analytics/ML-dependent financial operations
Combined 100+ documented outages in 12 months ~2 outages per week across providers Cumulative: dozens of financial institutions

Mapping Outages to DORA Requirements

Each cloud outage is a data point that validates DORA's regulatory thesis and tests specific articles of the regulation. The mapping between outage characteristics and DORA requirements reveals how well the regulatory framework anticipated real-world failure patterns.

Art. 29: Concentration Risk

The 100+ outage count across three providers does not indicate that cloud computing is unreliable. It indicates that the financial sector's cloud dependency creates correlated failure exposure that must be managed. Art. 29 requires financial entities to assess concentration risk by considering:

(a) Non-substitutability. The October AWS outage demonstrated that institutions with single-provider cloud architectures had no failover option. The outage lasted 15 hours — well beyond the RTO targets of most critical banking services. If your core banking platform runs exclusively on AWS, and AWS is down for 15 hours, your RTO target is irrelevant.

(b) Multiple dependencies on the same provider. Institutions using AWS for compute, storage, databases, and networking experienced correlated failures across all service layers simultaneously. Art. 29(2)(b) specifically addresses this pattern of multiple contractual arrangements with the same provider.

(c) Systemic concentration. When AWS went down in October 2025, Coinbase, Robinhood, Capital One, and dozens of other financial services providers went down simultaneously. The financial sector's collective concentration on three cloud providers creates systemic risk that no single institution's risk management can address.

Art. 11-12: Business Continuity and Recovery

Art. 11 requires ICT business continuity policies that address "the continuation of the financial entity's critical or important functions." Art. 12 requires backup policies and recovery methods that "ensure the restoration of ICT systems and data with minimum downtime and limited disruption."

The outage data provides an empirical test: how many institutions achieved their stated RTO and RPO targets during major cloud outages?

The answer, based on public reporting and incident postmortems, is mixed:

Recovery dimension DORA requirement Industry performance during major outages
RTO for critical functions Art. 11: Defined and tested Many institutions exceeded RTO targets during multi-hour outages
RPO for critical data Art. 12: Backup policies ensure minimal data loss Generally met (cloud replication effective), but dependent on architecture
Communication to customers Art. 11: Continuation of services or transparent communication Inconsistent — some institutions communicated proactively, others were silent
Failover to alternative infrastructure Art. 11: Tested recovery procedures Only institutions with genuine multi-region/multi-cloud achieved failover
Post-incident review Art. 13: Learning and evolving Dependent on institution's incident management maturity

Art. 17-23: Incident Management

Each cloud outage that disrupts a financial entity's services potentially triggers Art. 17-23 incident management obligations. The assessment framework:

Is it a major ICT-related incident? Art. 18(1) criteria include: impact on critical functions, financial impact, duration, geographic spread, and number of clients affected. A multi-hour outage of a customer-facing banking platform — caused by a cloud provider failure — crosses most of these thresholds.

Does it require NCA notification? Art. 19(1) requires notification to the competent authority for major ICT-related incidents. The initial notification deadline is 4 hours from classification. For institutions hit by the AWS October outage, the timeline was tight: detect the provider failure, assess its impact on critical functions, classify the incident, and notify the NCA — all within 4 hours.

Does it require customer notification? Art. 19(3) requires that financial entities notify clients "where a major ICT-related incident has or may have an impact on the financial interests of clients." Cloud outages that affect payment processing, account access, or trading capabilities fall squarely within this provision.

The Frequency Problem: Two Outages Per Week

The aggregate statistic — 100+ outages in 12 months across three providers — translates to approximately two outages per week. While not every outage affects financial sector workloads, and many are limited in scope and duration, the frequency creates a compounding risk that static risk assessments fail to capture.

Incident fatigue. Teams that experience cloud disruptions bi-weekly develop normalized response patterns that may miss unusual severity signals. When every week brings an outage, the one that is genuinely catastrophic may not receive the escalation it warrants.

Testing validity erosion. Disaster recovery plans tested against a single, isolated failure scenario may perform differently when the institution has already experienced two cloud disruptions in the same month. Resource fatigue, partially degraded systems from previous incidents, and incomplete remediation from the last event all affect recovery capability.

Board reporting saturation. Art. 14 requires the management body to be informed about ICT risk. Reporting two cloud outages per week risks either saturating the board with operational noise or, worse, creating a pattern of under-reporting where only the most severe events reach board attention.

The frequency argument strengthens the case for continuous monitoring over periodic assessment. An annual concentration risk assessment under Art. 29 cannot capture a threat landscape that produces 100+ events in 12 months. The assessment is outdated before the ink dries.

The Architecture Imperative

The outage data produces a clear architectural conclusion: financial institutions relying on a single cloud provider for critical functions face a level of concentration risk that is empirically demonstrated, not theoretical. The 100+ outages are not hypothetical scenarios — they are documented events with quantifiable financial and operational impact.

DORA does not prescribe specific architectural solutions. It does not mandate multi-cloud. It does not require on-premises backup for cloud-hosted services. What it requires is that institutions assess, manage, and report the concentration risk — and maintain credible exit strategies for the providers they depend on.

The architectural responses available to institutions, in order of increasing resilience and cost:

1. Multi-region within a single provider — reduces geographic concentration, does not address provider concentration. The AWS October outage affected multiple regions simultaneously, demonstrating the limits of this approach for provider-level failures.

2. Multi-cloud for critical services — genuine diversification that addresses Art. 29 concentration risk. Expensive and operationally complex, but the only architecture that provides provider-level failover.

3. Hybrid cloud with on-premises fallback — maintains on-premises capability for the most critical services, using cloud for scalability and secondary workloads. Addresses provider dependency but adds infrastructure complexity.

4. Active-active multi-cloud — the most resilient architecture, where critical services run simultaneously on multiple providers with real-time data synchronization. The most expensive and complex to operate, but provides sub-second failover.

The choice depends on the institution's risk tolerance, critical function RTO targets, and investment capacity. What DORA requires is that the choice is deliberate, documented, and tested — not default.

The Path Forward

The 100+ outages in 12 months are not an indictment of cloud computing. Cloud remains the most scalable, cost-effective infrastructure model for most financial services workloads. But the data demolishes the assumption of cloud infallibility that has underpinned many institutions' risk assessments.

DORA's value — demonstrated by the empirical record of Year One — is that it forces institutions to confront this reality systematically: assess concentration, plan for failure, test recovery, document evidence, and report to the board. The institutions that have done this work are not immune to cloud outages. But they recover faster, with less customer impact, and with the evidence trail that supervisors will increasingly demand.


This analysis reflects publicly documented cloud outage data from August 2024 through August 2025 and DORA Regulation (EU) 2022/2554 Articles 11, 12, 17-23, and 29. Use our self-assessment tool to evaluate your institution's readiness.


Share