Real-Time Payments Need Real-Time Resilience: DORA and the Instant Settlement Challenge

Ten Seconds, No Exceptions
The EU Instant Payments Regulation (IPR), published in March 2024, requires all payment service providers (PSPs) in the EU to offer SEPA Instant Credit Transfers. The timeline is aggressive: by late 2025 for receiving, and by 2026 for sending, all PSPs must support instant payments. Transactions must settle in under 10 seconds, available 24 hours a day, 365 days a year.
This is a fundamentally different operational paradigm from traditional batch processing. SEPA Credit Transfers settle in one business day. SEPA Direct Debits settle in two. Batch processing runs overnight, with maintenance windows, reconciliation periods, and scheduled downtime. The entire infrastructure of European retail payments was built around the assumption that there are quiet periods — times when systems can be patched, databases can be reindexed, and backups can run without impacting live transactions.
Instant payments eliminate quiet periods. Every second of every day is peak time. And DORA requires that the systems supporting this 24/7/365 service be reliable (Art. 7), recoverable (Art. 11), tested (Art. 24), and documented (Art. 8).
The Resilience Mathematics of Instant Payments
Availability Requirements
Traditional payment systems operate during business hours — roughly 12 hours per day, 5 days per week. Even with generous availability targets (99.9%), this translates to approximately 31 minutes of acceptable downtime per year, occurring during predictable periods.
Instant payments operate 24/7/365. The same 99.9% availability allows 8 hours and 46 minutes of downtime per year — but now spread unpredictably across all hours, including nights, weekends, and holidays. The practical impact:
| Availability Target | Annual Downtime (24/7/365) | Acceptable for Instant Payments? |
|---|---|---|
| 99.0% | 87 hours 36 minutes | No — nearly 4 days of outage |
| 99.9% | 8 hours 46 minutes | Marginal — still significant for 10-second SLA |
| 99.95% | 4 hours 23 minutes | Minimum for competitive service |
| 99.99% | 52 minutes 34 seconds | Target for critical payment rails |
| 99.999% | 5 minutes 15 seconds | Aspirational — requires active-active multi-region |
For a service with a 10-second settlement SLA, even 99.99% availability means nearly an hour per year where instant payments fail. Art. 7's requirement for systems with "sufficient capacity" and "technological resilience" implicitly demands the upper end of this spectrum.
Recovery Objectives
DORA Art. 11 requires documented RTOs and RPOs. For instant payment systems:
- RTO must be near-zero for the core settlement engine. A 4-hour RTO — acceptable for batch payment systems — means 4 hours where customers cannot send or receive instant payments. For a service advertised as "always on," this is a service failure.
- RPO must be zero. A payment that was accepted, debited from the sender's account, but lost due to a system failure before crediting the receiver creates a financial discrepancy that must be resolved. Zero RPO requires synchronous replication across all components of the settlement chain.
DORA Requirements Under Real-Time Pressure
Art. 7: Reliability at 24/7/365
Art. 7 requires reliable, capable, and technologically resilient systems. For instant payments:
Every component in the settlement chain must meet sub-second performance targets under all conditions — not just under normal load, but during peak periods (month-end salary payments, Black Friday, holiday seasons), during partial failure conditions (one availability zone down, one dependency degraded), and during concurrent operations (sanctions list updates, system patches, database maintenance).
This requires:
- Active-active architecture across multiple availability zones or regions (no primary/secondary failover — both sides process live traffic)
- Circuit breakers on every external dependency (sanctions screening services, fraud detection engines, correspondent bank interfaces)
- Graceful degradation strategies (if fraud detection is temporarily unavailable, is the payment queued, rejected, or processed with compensating controls?)
- Zero-downtime deployment (rolling updates, canary deployments, feature flags — no maintenance windows)
Art. 11: Business Continuity Without Downtime
Traditional BCP assumes a period of degraded or unavailable service during recovery. Instant payments challenge this assumption:
| BCP Element | Traditional Payments | Instant Payments |
|---|---|---|
| Maintenance windows | Scheduled (nights, weekends) | Not available — 24/7/365 |
| Failover time | Minutes to hours (acceptable) | Seconds (maximum) |
| Batch reconciliation | Overnight, corrects intra-day issues | Not applicable — real-time consistency required |
| DR activation | Manual, with scheduled testing | Automated, continuous active-active |
| Communication plan | Notify customers of scheduled downtime | Downtime is never scheduled — all outages are incidents |
Art. 11 continuity plans for instant payments must assume that any outage is an incident, recovery must be automated (manual failover exceeding 10 seconds violates the settlement SLA), and testing must validate continuous operation, not just recovery from failure.
Art. 24-27: Testing Real-Time Systems
Art. 24-27 testing requirements take on new urgency for instant payment systems:
- Performance testing must validate sub-second response times under peak load, not just average load. A system that meets the 10-second SLA at average volumes but fails at 3x peak is not resilient.
- Failure testing must validate automatic failover without transaction loss. A chaos engineering approach — injecting failures into production components and verifying that transactions continue to settle within 10 seconds — is the gold standard.
- Integration testing must include external dependencies: the SWIFT network, correspondent banks, sanctions screening services, and the EBA RT1 or TIPS infrastructure.
Art. 17-19: Incident Management for Always-On Systems
Art. 17 incident classification for instant payment systems has a lower threshold than for batch systems. A 5-minute outage in a batch payment system during overnight processing may affect no customers. A 5-minute outage in an instant payment system during a Saturday afternoon affects every payment in progress and every payment attempted during those 5 minutes.
The incident classification criteria — customers affected, service degradation duration, data loss — will be triggered more frequently for instant payment systems simply because the service is always running and always serving customers.
Third-Party Dependencies in Instant Payments
Instant payments increase third-party dependency risk because they add external dependencies that must meet the same real-time SLA:
Payment infrastructure providers (EBA Clearing's RT1, the Eurosystem's TIPS) are the clearing and settlement mechanism. Their availability directly constrains the institution's ability to settle payments.
Sanctions screening services must deliver results in under 1 second. A sanctions check that takes 5 seconds consumes half the settlement budget. Institutions must either maintain in-house sanctions capabilities (avoiding the third-party dependency) or negotiate sub-second SLAs with screening providers.
Fraud detection services face the same latency constraint. Real-time fraud scoring must complete within the settlement window.
Each of these dependencies must be managed under Art. 28 with SLAs, failover strategies, and regular resilience testing. The vendor risk scoring methodology must include latency performance and real-time availability as risk factors.
The Regulatory Intersection: IPR + DORA
The Instant Payments Regulation creates the mandate. DORA creates the resilience framework. Together, they require financial institutions to build and operate payment systems at a level of reliability historically associated with stock exchange matching engines or air traffic control systems.
The ECB and national competent authorities will examine instant payment resilience as a priority area, given the public visibility of instant payment failures and the systemic importance of payment infrastructure.
Key Takeaways
- Instant payments eliminate maintenance windows and require 24/7/365 availability at 99.99%+ levels. Traditional availability targets are insufficient.
- RTO must be near-zero, RPO must be zero for instant settlement engines. Active-active architecture with synchronous replication is required.
- Every component in the 10-second settlement chain must meet sub-second performance under all conditions, including peak load and partial failure.
- Art. 24-27 testing must validate real-time resilience: load testing at 3x peak, automated failover with zero transaction loss, dependency degradation handling.
- Third-party dependencies (sanctions screening, fraud detection, clearing infrastructure) must meet the same real-time SLA. Manage under Art. 28.
- Any outage is an incident. Art. 17 classification thresholds are lower for always-on services.
Resume en francais
Le reglement europeen sur les paiements instantanes impose que tous les prestataires de services de paiement offrent des virements SEPA instantanes reglant en moins de 10 secondes, 24/7/365. Cela elimine les fenetres de maintenance et exige une disponibilite de 99,99 %+. Sous DORA, l'article 7 impose des systemes fiables — ce qui signifie une architecture active-active sans temps d'arret. L'article 11 exige des plans de continuite avec un RTO proche de zero et un RPO de zero pour les moteurs de reglement instantane. L'article 24-27 exige des tests validant la resilience en temps reel : tests de charge a 3x le volume de pointe, basculement automatique sans perte de transaction, degradation gracieuse des dependances. Chaque composant de la chaine de reglement (initiation, fraude, sanctions, validation, reglement, confirmation) doit respecter des objectifs de performance infra-seconde. Les dependances tierces (criblage des sanctions, detection de fraude, infrastructure de compensation) doivent respecter le meme SLA temps reel et etre gerees sous l'article 28. Toute interruption est un incident sous l'article 17, car le service est en operation permanente avec des clients impactes a chaque seconde d'indisponibilite.