Beyond Tabletop Exercises: DORA's Demand for Real Disaster Recovery Testing

The Tabletop Illusion
Every year, financial institutions gather their business continuity teams in a conference room, present a disaster scenario ("the primary data center is unavailable due to a fire"), walk through the recovery procedure on paper, document the discussion, and file the exercise as evidence of disaster recovery testing. Everyone agrees the plan would work. Nobody has actually tested whether it does.
This is the tabletop exercise — and for two decades, it has been the industry standard for disaster recovery validation. It is useful for identifying gaps in procedures, training participants on their roles, and building organizational awareness. What it does not do — and cannot do — is prove that systems can actually recover within documented Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs).
DORA Art. 24 changes this. It requires a "digital operational resilience testing programme" that includes "a range of tests, including vulnerability assessments and scans, open-source analyses, network security assessments, gap analyses, physical security reviews, questionnaires and scanning software solutions, source code reviews where feasible, scenario-based tests, compatibility testing, performance testing, end-to-end testing and penetration testing."
The operative phrase is "scenario-based tests." A tabletop exercise is a scenario discussion. A scenario-based test is a scenario execution. DORA demands the latter.
Why Tabletop Exercises Fail DORA
The Evidence Gap
Art. 25 requires that testing "identifies weaknesses, deficiencies or gaps in the digital operational resilience" and that "corrective measures are promptly taken." A tabletop exercise that concludes with "we believe the plan would work" identifies no empirical weaknesses. It produces assumptions, not evidence.
The ECB's 2024 cyber resilience stress test across 109 banks provided the empirical data that tabletops cannot: when banks actually attempted to recover critical systems under simulated stress conditions, 31% could not meet their declared RTOs. The gap between declared and actual recovery capability was measured in hours — sometimes days.
| Metric | Declared (Tabletop-Validated) | Actual (ECB Stress Test) | Gap |
|---|---|---|---|
| Core banking RTO | 4 hours | 6-18 hours | 2-14 hours |
| Payment processing RTO | 2 hours | 3-8 hours | 1-6 hours |
| Online banking RTO | 1 hour | 2-12 hours | 1-11 hours |
| Data recovery RPO | 1 hour | 4-24 hours | 3-23 hours |
Source: Aggregated from ECB stress test findings and EBA supervisory reports, 2024
The Dependency Blindspot
Tabletop exercises typically focus on a single system or process. Real incidents cascade. A data center failure does not just affect the servers in that data center — it affects DNS resolution, certificate authorities, monitoring systems, log aggregation, authentication services, and third-party integrations that depend on the affected network. These cascading effects are invisible in tabletop exercises and devastating in real incidents.
The Human Factor
In a tabletop exercise, every participant knows it is an exercise. Response times are idealized. Decision-making is unclouded by the stress of a real incident. Communication paths work because everyone is in the same room. In a real incident at 3 AM on a Sunday, the on-call engineer is sleep-deprived, the escalation path requires calling people who are unavailable, and the runbook refers to a system that was decommissioned six months ago.
DORA's Testing Maturity Model
DORA does not explicitly define testing maturity levels, but the regulation's requirements imply a progression from basic to advanced:
Level 3: Simulation Testing (DORA Minimum)
Simulation testing uses real systems in controlled environments. The institution provisions a replica of the production environment (or uses a scaled-down representative subset), introduces the failure scenario, and measures actual recovery performance.
What it proves: That the recovery procedure works against real software, that data can be restored from actual backups, that the team can execute the procedure within the RTO, and that the recovered system produces correct outputs.
What it does not prove: That the procedure works under production load, that production-specific configurations (certificates, DNS, third-party connections) survive failover, or that the human response under real incident stress matches the controlled test.
Evidence produced:
- Recovery time measurement (start of scenario → system operational)
- Recovery point measurement (timestamp of last recoverable data)
- Step-by-step execution log with timestamps
- System validation results (functional test suite against recovered system)
- Issues discovered and corrective actions logged
Level 4: Live Recovery Testing
Live recovery testing performs actual failover of production systems to disaster recovery infrastructure. This is the gold standard for critical and important functions under DORA. The institution redirects production traffic to the DR site, operates from DR for a defined period, and then fails back.
What it proves: That the DR environment can support production workloads, that production data is recoverable with acceptable RPO, that third-party integrations survive failover, and that the institution can operate from DR under real-world conditions.
| Test Level | Realism | Risk | Evidence Quality | DORA Suitability |
|---|---|---|---|---|
| Tabletop | Low | None | Weak — opinions, not measurements | Insufficient for Art. 24-25 |
| Simulation | Medium | Low | Moderate — real systems, artificial conditions | Minimum for non-critical functions |
| Live recovery | High | Medium | Strong — production conditions, measured outcomes | Required for critical functions |
| Chaos engineering | Very high | Medium-High | Strongest — continuous, production-validated | Advanced Art. 25 compliance |
Level 5: Chaos Engineering
Chaos engineering — the discipline of deliberately introducing failures into production systems to validate resilience — represents the most advanced form of DORA testing. It addresses the fundamental limitation of scheduled DR tests: they are predictable. The team knows the scenario, the timing, and the expected response. Real incidents are unpredictable.
Chaos engineering validates that systems are resilient to failures that nobody anticipated, that automated recovery mechanisms work without human intervention, and that the system degrades gracefully under partial failure conditions.
Building a DORA-Compliant DR Testing Programme
Step 1: Classify Functions by Criticality
Art. 11(3) requires that ICT business continuity policies account for the criticality of the function. The BIA determines which functions are critical and what their RTOs and RPOs are. The testing level must match the criticality:
- Critical functions (Art. 3(22)): Live recovery testing annually, simulation testing quarterly
- Important functions: Simulation testing semi-annually, tabletop exercises quarterly
- Standard functions: Tabletop exercises annually, documentation review as part of the annual framework review
Step 2: Design Scenarios That Matter
The testing programme must cover scenarios aligned with actual risk profile, not just convenient scenarios. The EBA has signaled that supervisors will examine whether testing scenarios are realistic and cover the institution's material risks:
Step 3: Measure and Document
Every DR test must produce quantitative evidence:
| Measurement | Definition | DORA Article | Acceptance Criteria |
|---|---|---|---|
| Recovery Time Actual (RTA) | Time from scenario initiation to system operational | Art. 11 — RTO | RTA <= declared RTO |
| Recovery Point Actual (RPA) | Data loss measured in time | Art. 11 — RPO | RPA <= declared RPO |
| Functional Validation Score | % of functional tests passed on recovered system | Art. 25 — weakness identification | 100% for critical functions |
| Participant Response Time | Time from notification to response | Art. 17 — incident management | Within SLA for role |
| Issues Discovered | Count and severity of issues found | Art. 25 — corrective measures | All tracked to resolution |
Step 4: Close the Loop
Art. 25(3) requires that testing results be communicated to the management body and that corrective measures be taken. This is the most frequently neglected step: institutions run DR tests, produce results, and file them. They do not systematically track the issues discovered, assign corrective actions, verify those actions are completed, and retest to confirm the fix.
The DR testing programme must integrate with the institution's deviation management process. Issues found during DR testing are deviations from expected resilience. They require the same CAPA lifecycle as any other finding: triage, root cause analysis, corrective action, verification, and closure.
TLPT: The Apex of DORA Testing
For institutions designated by competent authorities, Art. 26 requires Threat-Led Penetration Testing (TLPT). TLPT goes beyond DR testing to simulate real attacker behavior against production systems, testing not just recovery but prevention and detection as well.
TLPT is not a replacement for DR testing — it is a complement. DR testing validates recovery capability. TLPT validates the entire resilience chain: can the institution detect an attack, contain it, recover from it, and report it within regulatory timelines?
The Cultural Shift
Moving beyond tabletop exercises requires a cultural shift. Engineering teams must accept that production systems will be deliberately disrupted for testing. Business stakeholders must accept that testing may cause brief service degradation. The management body must fund DR testing that produces real evidence, not just comfortable assurances.
The institutions that made this shift before DORA — those that embraced chaos engineering, live recovery testing, and continuous resilience validation — are now compliance-ready. The institutions that relied on tabletop exercises are discovering a gap between their declared RTOs and their actual recovery capability — a gap that supervisors can now measure and enforce.
Key Takeaways
- Tabletop exercises are necessary but insufficient for DORA Art. 24-25 compliance. They validate procedures, not actual recovery capability.
- The ECB stress test proved the gap: 31% of banks could not recover within declared RTOs. Tabletop-validated RTOs are unreliable.
- Simulation testing is the minimum for Art. 24 compliance. Live recovery testing is required for critical functions.
- Every DR test must produce quantitative evidence: RTA, RPA, functional validation scores, and issue logs.
- Issues from DR testing require CAPA treatment — tracked to resolution and retested, per Art. 25(3).
- Chaos engineering represents advanced Art. 25 compliance — continuous, automated, production-validated resilience testing.
Resume en francais
Les exercices sur table sont insuffisants pour la conformite DORA. L'article 24 exige des tests bases sur des scenarios impliquant des systemes reels, pas des discussions theoriques. Le stress test de la BCE en 2024 a demontre empiriquement le probleme : 31 % des banques ne pouvaient pas respecter leurs RTO declares lors de tests en conditions reelles. L'ecart entre la capacite de reprise declaree (validee par tabletop) et la capacite reelle etait de plusieurs heures, parfois jours. Ce guide propose un modele de maturite en cinq niveaux : revue documentaire, exercice sur table, test de simulation, test de reprise en conditions reelles et ingenierie du chaos. Les fonctions critiques (Art. 3(22)) necessitent au minimum des tests de reprise reelle annuels. Chaque test doit produire des preuves quantitatives : temps de reprise mesure (RTA vs RTO), point de reprise mesure (RPA vs RPO), resultats de validation fonctionnelle et journal des anomalies. Les problemes decouverts doivent etre traites comme des ecarts avec un cycle CAPA complet — triage, analyse causale, action corrective, verification et cloture — conformement a l'article 25(3).