158 Outages, 803 Hours: What UK Banking IT Failures Teach Us About Operational Resilience

33 Days of Darkness
Between January 2023 and February 2025, nine major UK banks accumulated 158 IT failure incidents. The aggregate downtime: over 803 hours — the equivalent of 33 full days when one or more critical banking services were unavailable to customers. The data, compiled from public incident reports and regulatory disclosures, reveals not isolated failures but a structural pattern of fragility across the UK — across the UK banking sector's technology infrastructure. Meanwhile, in Europe, the EBA drives digital resilience through DORA.
The UK is not in DORA's direct scope — it is no longer an EU member state. But the UK banking sector's experience provides the most comprehensive public dataset on operational resilience failures in major financial institutions. For EU institutions now subject to DORA, these failures are not cautionary tales from a foreign jurisdiction. They are empirical evidence of what happens when operational resilience governance fails — and a preview of what DORA's requirements are designed to prevent.
The Barclays Outage: A Case Study in Cascading Failure
Figure 1: Cascading impact of the Barclays mainframe failure. A single infrastructure failure propagated across online banking, mobile banking, and payment processing, with 56% of payment transactions failing during the month-end cycle.
On January 31, 2025, Barclays experienced a mainframe failure that disabled online banking, mobile banking, and payment processing services. The outage persisted for three days — through the weekend and into February 2 — coinciding with one of the busiest periods in the UK banking calendar: month-end salary processing, direct debit collections, and mortgage payment deadlines.
The impact data, disclosed through regulatory filings and public communications, tells the story:
| Metric | Value | Context |
|---|---|---|
| Outage duration | ~72 hours (Jan 31 - Feb 2) | Spanning month-end processing cycle |
| Login failure rate | 5% of all attempts | Sustained over the outage period |
| Payment attempts affected | 17% of all transactions | Including salary credits and direct debits |
| Payment failure rate | 56% of attempted payments failed | More than half of transactions that entered the system did not complete |
| Customer compensation | GBP 5-7.5 million | Direct remediation payments to affected customers |
| Two-year compensation total | GBP 12.5 million | Including prior incident remediation |
| Parliamentary response | Select Committee inquiry | MPs demanded bank executives explain IT failures |
The 56% payment failure rate is the critical figure. When more than half of payment transactions fail during a month-end processing cycle, the impact cascades beyond the bank's own customers. Employers whose salary payments fail face payroll crises. Utility companies whose direct debits bounce face revenue disruption. Mortgage lenders whose collections fail face liquidity management complications. A single bank's mainframe failure becomes a multi-party financial event.
The Anatomy of 158 Failures
The 158 IT failures across nine major UK banks were not homogeneous. They decompose into distinct failure categories, each with different root causes and resilience implications:
| Failure Category | Estimated Share | Typical Duration | Recovery Complexity |
|---|---|---|---|
| Core banking / mainframe failures | 15-20% | 12-72 hours | Very high — limited expertise, legacy architecture |
| Payment processing disruptions | 25-30% | 2-12 hours | High — real-time settlement dependencies |
| Mobile/online banking outages | 20-25% | 1-6 hours | Moderate — typically application-layer issues |
| Third-party service dependencies | 15-20% | Variable (2-48 hours) | High — recovery depends on external provider |
| Capacity/performance degradation | 10-15% | 1-4 hours | Moderate — but often recurring |
The mainframe failures, while less frequent, account for disproportionate impact. Barclays' January 2025 outage was a mainframe event. These systems — many running on decades-old architectures — process the highest-value, highest-volume transactions and have the least redundancy. When they fail, the recovery path is narrow, the expertise is scarce, and the blast radius is maximum.
Payment processing disruptions are the most frequent category and the most visible to customers. The UK Faster Payments system processes over 4 billion transactions annually, and any interruption in a bank's connection to the scheme immediately affects customer transactions. DORA's Art. 11(4) specifically addresses payment-related functions, requiring "adequate disaster recovery capabilities" that can "resume the operations of critical functions within short and predefined time frames."
The Payday Cascade: February 28, 2025
On February 28, 2025 — the last Friday of the month and the single busiest payday in the UK banking calendar — multiple banks experienced simultaneous service degradation. Lloyds Banking Group (encompassing Lloyds, Halifax, Bank of Scotland), TSB, and Nationwide Building Society all reported service disruptions. Customer complaint aggregators recorded over 4,000 reports within the first two hours.
The simultaneous failure across multiple institutions is significant. It suggests either a shared infrastructure dependency (a common payment scheme, a shared clearing system, or a common third-party provider) or correlated load failure (all institutions experiencing peak demand simultaneously, with insufficient capacity headroom).
From a DORA perspective, this is precisely the concentration risk scenario that Art. 29 addresses. When multiple financial entities depend on the same ICT infrastructure or third-party service — and that shared dependency fails under peak load — the systemic impact exceeds the sum of individual institution failures. The payday cascade was not four separate incidents. It was one systemic event manifesting across four brands.
Mapping UK Failures to DORA Requirements
Figure 2: Mapping UK banking failure categories to DORA articles that address each root cause. Every failure type has a corresponding DORA prevention mechanism.
Each category of UK banking failure maps to specific DORA articles that, if implemented effectively, would have either prevented the incident or materially reduced its impact:
Art. 8 — Identification: DORA requires financial entities to "identify, classify, and adequately document all ICT supported business functions." The UK experience shows that many institutions could not map the dependency chain from a mainframe failure to its downstream business impact — they discovered the blast radius of their incidents in real time rather than through pre-existing dependency mapping.
Art. 9 — Protection and prevention: Art. 9(2) mandates "policies and procedures for ICT change management, including changes to software, hardware, firmware components, systems, or security parameters." The Barclays mainframe failure occurred during or adjacent to a system change. Change management failures — inadequate testing, insufficient rollback planning, change collision — are the single most common root cause category for critical banking outages.
Art. 11 — Response and recovery: Art. 11(1) requires financial entities to "put in place a comprehensive ICT business continuity policy." Art. 11(3) mandates that entities "test the ICT business continuity plans and the ICT response and recovery plans at least yearly." The 72-hour Barclays outage duration indicates that either the recovery plan was inadequate, the recovery was not tested at the required scale, or the tested recovery did not work as designed under real conditions.
Art. 12 — Backup policies and recovery: Art. 12(1) requires "backup policies and procedures specifying the scope of the data that is subject to the backup and the minimum frequency of the backup." Art. 12(2) requires that restoration from backups use "ICT systems that are physically and logically segregated from the source ICT system." The multi-day recovery timeline for mainframe failures suggests that backup restoration at scale — for the transaction volumes these systems process — was not validated pre-incident.
Art. 17-23 — Incident management: DORA's incident management framework requires detection, classification, and initial notification to the competent authority within four hours for major incidents (Art. 19(4)(a)). UK banks' public communications during the outages were delayed, inconsistent, and often less informative than social media commentary from affected customers. DORA's structured reporting requirements — including intermediate and final reports — would impose a discipline that was visibly absent.
The Cost of Inaction: Quantifying Resilience Failure
The financial cost of UK banking IT failures extends beyond direct customer compensation:
| Cost Category | Barclays (Jan-Feb 2025) | Industry Estimate (Annual) |
|---|---|---|
| Customer compensation | GBP 5-7.5M per incident | GBP 50-100M across sector |
| Regulatory investigation costs | GBP 2-5M (estimated) | GBP 20-40M across sector |
| Reputational damage (customer attrition) | Not publicly quantified | 1-3% incremental churn per major outage |
| Parliamentary/political scrutiny costs | Management time, legal preparation | Recurring for repeat offenders |
| Technology remediation (post-incident) | GBP 20-50M per major programme | GBP 200-500M across sector |
| Operational disruption to counterparties | Not compensated, but real | Multiplied across the financial ecosystem |
The cumulative cost — direct compensation, regulatory remediation, technology investment, reputational damage, and management distraction — dwarfs the cost of the resilience measures that would have prevented or contained these incidents. The UK banking sector is spending more on cleaning up after resilience failures than it would have spent building resilient systems.
What DORA Would Have Required (and What It Would Have Changed)
DORA does not guarantee that outages will not occur. No regulation can. What DORA requires is a governance framework that makes cascading, multi-day, customer-impacting failures less likely and, when they occur, less severe and better managed.
Specifically:
Figure 3: DORA's three-phase resilience lifecycle — before, during, and after an incident — with continuous improvement feedback loop.
Before the incident:
- Dependency mapping (Art. 8) that identifies mainframe systems as single points of failure supporting critical functions
- Recovery time objectives validated through testing (Art. 11(3), Art. 25) — not aspirational RTOs in a document, but demonstrated recovery capability
- Backup restoration tested at production scale (Art. 12(2)) — not partial restoration of a test environment
- Change management policies (Art. 9(2)) that prevent change deployments during peak processing windows without enhanced controls
- Capacity planning (Art. 9(4)(d)) that accounts for peak load scenarios and ensures infrastructure headroom
During the incident:
- Structured incident classification (Art. 18) within minutes of detection, not hours
- Initial notification to competent authority within four hours (Art. 19(4)(a))
- Standardized communication to affected counterparties and customers (Art. 14(2), Art. 23(2))
- Management body briefing with impact quantification (Art. 14(2))
After the incident:
- Root cause analysis documented and reported (Art. 13)
- Lessons learned integrated into the ICT risk management framework (Art. 6(5))
- Testing programme updated to cover the failure scenario (Art. 25)
- Third-party dependencies reviewed if the failure involved external providers (Art. 28-29)
Would this framework have prevented the Barclays outage? The honest answer is: probably not entirely. The mainframe failure may have occurred regardless. But DORA's recovery testing requirements (Art. 11(3)) would have required Barclays to demonstrate — through actual testing — that it could recover the mainframe within its stated RTO. If that test revealed a 72-hour recovery gap, the institution would have been required to either fix the recovery capability or adjust its risk assessment and business continuity planning accordingly.
The 72-hour outage would have been, at worst, a 12-hour outage — because the recovery plan would have been tested, the backup restoration would have been validated, and the detection-to-escalation timeline would have been compressed.
Lessons for EU Institutions
The UK banking sector's 158 failures in 25 months provide five clear lessons for EU institutions implementing DORA:
1. Test recovery, not just backup. Having backups is not the same as being able to restore from them at production scale within your stated RTO. Art. 11(3) and Art. 12(2) distinguish between having backup policies and validating recovery capability. The institutions that recovered quickly from failures had tested their recovery procedures under realistic conditions.
2. Map dependencies to blast radius. The payday cascade revealed shared dependencies that no single institution had mapped. Art. 8's identification requirements and Art. 29's concentration risk assessment exist because correlated failures are more damaging than isolated ones — and less visible before they occur.
3. Invest in mainframe resilience. For institutions running legacy core banking systems, mainframe failures represent the highest-impact, lowest-frequency risk. DORA's proportionality principle (Art. 4) does not exempt legacy systems from resilience requirements — if anything, it implies that systems supporting critical functions deserve the highest resilience investment.
4. Automate incident response. The 4-hour initial notification window (Art. 19(4)(a)) is incompatible with manual detection and classification processes. Institutions that rely on customer complaints to detect outages — as several UK banks appeared to during the February events — will fail to meet DORA's reporting timelines.
5. Treat compensation as a lagging indicator. Customer compensation is the cost of failure, not the measure of resilience. DORA's requirements — testing, evidence, governance — are leading indicators. Institutions that invest in leading indicators spend less on lagging ones.
The UK's 803 hours of banking downtime are an expensive dataset. For EU institutions, the cost has already been paid — by someone else. The remaining question is whether those institutions will learn from it.
This analysis uses publicly disclosed incident data from UK banking institutions and regulatory communications between January 2023 and February 2025. DORA article references are to Regulation (EU) 2022/2554.
See also: DORA disaster recovery testing beyond tabletop | DORA business impact analysis and RTO/RPO guide | Cloud concentration risk under DORA