Generative AI Risk Under DORA: Why LLMs Need Operational Resilience Governance

The LLM Wave Hits Banking
In the 18 months since generative AI became enterprise-ready, European financial institutions have moved from cautious experimentation to production deployment. The use cases are multiplying:
- Customer service chatbots that handle routine inquiries, reducing call center volume
- Document analysis that extracts structured data from contracts, regulatory filings, and correspondence
- Risk report generation that synthesizes data into management-ready narratives
- Code generation that accelerates software development for internal systems
- Compliance monitoring that scans communications for regulatory violations
- Fraud detection that identifies anomalous patterns in transaction narratives
Each of these deployments is an ICT system as defined by DORA. Each must comply with the full range of DORA requirements — from Art. 7 reliability to Art. 8 asset registration to Art. 9 security to Art. 28 third-party risk management. And each introduces a category of risk that traditional ICT governance was not designed to address.
The challenge is not that DORA is insufficient for AI governance. As the previous analysis of AI under DORA argued, DORA's framework maps naturally to AI systems. The challenge is that generative AI specifically — and large language models in particular — introduce risk characteristics that require enhanced governance within the existing DORA framework.
Generative AI Risks That DORA Must Govern
Hallucination: The Reliability Problem
LLMs generate plausible but incorrect outputs — a phenomenon called hallucination. A customer service chatbot that provides incorrect information about an account balance, a document analysis system that fabricates a contract clause, or a risk report generator that includes a statistic that does not exist — each is a reliability failure under Art. 7.
Art. 7 requires ICT systems that are "reliable." Non-deterministic outputs — where the same input can produce different outputs across invocations — challenge the traditional definition of reliability. A core banking system that sometimes adds correctly and sometimes does not would be immediately flagged as defective. An LLM that sometimes summarizes a document accurately and sometimes fabricates content presents the same reliability problem, but the failure mode is harder to detect.
| Generative AI Risk | DORA Article | ICT Risk Equivalent | Governance Requirement |
|---|---|---|---|
| Hallucination | Art. 7 — reliability | System producing incorrect outputs | Output validation, confidence scoring, human review |
| Prompt injection | Art. 9 — protection | Injection attack (SQL, XSS) | Input sanitization, guardrails, monitoring |
| Training data poisoning | Art. 9 — data protection | Data integrity compromise | Data provenance, validation, access control |
| Non-deterministic outputs | Art. 7 — reliable systems | Unpredictable system behavior | Deterministic configuration, output logging |
| Data leakage in prompts | Art. 9 — data protection | Unauthorized data transfer | PII detection, prompt filtering, data classification |
| Model drift | Art. 7 — maintained systems | Performance degradation | Continuous evaluation, retraining governance |
| Vendor lock-in (API-based LLMs) | Art. 28 — third-party risk | Critical vendor dependency | Exit strategy, multi-model architecture |
Prompt Injection: The Security Problem
Prompt injection is the LLM equivalent of SQL injection. An attacker crafts input that causes the LLM to ignore its instructions and execute unintended behavior — leaking system prompts, accessing data it should not, or producing outputs that bypass guardrails.
In a financial services context, prompt injection against a customer-facing chatbot could expose the institution's internal instructions, system architecture details, or other customers' data. Against a document analysis system, it could cause the system to misclassify a high-risk contract as low-risk.
Art. 9(4)(a) requires authentication and access control. Art. 9(4)(c) requires data protection. Prompt injection bypasses both — it is an access control failure at the application logic layer, not the network or authentication layer.
Third-Party Risk: The Concentration Problem
Most financial institutions deploying generative AI are not training their own models. They are consuming models through cloud provider APIs — OpenAI via Microsoft Azure, Anthropic via AWS Bedrock, Google's models via GCP Vertex AI. Each API-based LLM deployment is an ICT third-party relationship under Art. 28.
The concentration risk is significant. If a bank's customer service chatbot, document analysis system, compliance monitor, and code generation tools all use the same underlying LLM provider, a single provider outage impacts multiple critical and important functions simultaneously. The concentration risk analysis must include LLM provider dependencies.
In this example, Provider A supports three of five AI functions — a concentration that requires Art. 28 risk assessment and documented exit strategies.
The Triple Compliance Challenge: DORA + AI Act + GDPR
European financial institutions deploying generative AI face three overlapping regulatory frameworks:
| Requirement | DORA | EU AI Act | GDPR |
|---|---|---|---|
| Risk assessment | Art. 6 — ICT risk management | Art. 9 — risk management system for high-risk AI | Art. 35 — DPIA for high-risk processing |
| Transparency | Art. 14 — management body reporting | Art. 13 — transparency requirements | Art. 12-14 — information to data subjects |
| Testing | Art. 24-27 — resilience testing | Art. 9(7) — testing before deployment | Art. 25 — data protection by design |
| Third-party management | Art. 28-30 — ICT third-party risk | Art. 28 — responsibilities of providers/deployers | Art. 28 — processor agreements |
| Incident management | Art. 17-23 — ICT incident reporting | Art. 62 — serious incident reporting | Art. 33-34 — breach notification |
| Human oversight | Art. 5 — management body governance | Art. 14 — human oversight requirement | Art. 22 — automated decision-making rights |
The efficient approach is a unified governance framework rather than three parallel compliance programmes. DORA provides the operational resilience layer, the AI Act provides the AI-specific safety layer, and GDPR provides the data protection layer. Each addresses different aspects of the same deployment.
Where DORA Provides Coverage the AI Act Does Not
The AI Act focuses on the AI system's outputs and safety. DORA focuses on the infrastructure underneath — availability, recoverability, third-party dependencies, incident management. An AI chatbot that meets AI Act transparency requirements but runs on a non-resilient infrastructure without tested recovery plans and documented third-party risk assessment fails DORA.
DORA provides what the AI Act assumes: that the ICT infrastructure supporting AI is resilient, recoverable, and governed.
A DORA-Native GenAI Governance Model
Step 1: Asset Registration (Art. 8)
Every LLM deployment must appear in the ICT asset register with:
- Model identity: Model name, version, provider, deployment method (API/on-premise)
- Criticality classification: Derived from the BIA — what business function does it support, and what is the impact of failure?
- Data classification: What data does the model ingest (prompts), process, and output? Are customer PII, financial data, or regulated data involved?
- Dependencies: Cloud provider, API endpoints, data pipelines, monitoring systems
- Owner: Who is accountable for the model's performance, safety, and compliance?
Step 2: Risk Assessment (Art. 6)
LLM-specific risk assessment within the DORA framework:
Step 3: Enhanced Controls for LLM Deployments
Output validation: Every LLM output in a customer-facing or decision-influencing context must be validated. For numerical outputs (account balances, risk scores), validate against source data. For textual outputs (customer communications, regulatory reports), implement confidence scoring and human review workflows for low-confidence outputs.
Input guardrails: Implement prompt filtering that detects and blocks injection attempts, removes PII from prompts before sending to external APIs, and enforces topic boundaries (preventing the model from discussing topics outside its intended scope).
Monitoring and anomaly detection: LLM-specific monitoring beyond standard observability: prompt/response logging (with PII redaction), output quality metrics (hallucination rate, user correction rate), latency and error rate trends, cost monitoring (API-based LLMs charge per token).
Step 4: Third-Party Risk Management (Art. 28)
For API-based LLM deployments:
- Contractual provisions per Art. 30: SLAs covering availability, latency, data processing location, data retention, and model version change notification
- Exit strategy: How to migrate from one LLM provider to another if the relationship must end — including prompt engineering portability, fine-tuning data recovery, and service continuity during migration
- Concentration risk assessment: Map LLM provider dependencies across all AI use cases and assess the impact of single-provider failure
- Data residency: Ensure prompts containing regulated data are processed in compliant jurisdictions (EU data sovereignty for GDPR, specific member state requirements where applicable)
Step 5: Testing (Art. 24-27)
LLM testing extends the standard testing programme:
- Adversarial testing: Systematic prompt injection testing, jailbreak attempts, and guardrail bypass testing — analogous to penetration testing for traditional applications
- Output quality regression testing: Benchmark test sets that validate model accuracy over time, detecting drift before it impacts users
- Resilience testing: What happens when the LLM provider API is unavailable? Does the application gracefully degrade, or does it fail completely?
- Recovery testing: Can the institution restore LLM-dependent services within their RTOs? What is the recovery procedure for a model that has been compromised or is producing incorrect outputs?
Supervisory Outlook
The EBA, ESMA, and national competent authorities are closely monitoring AI adoption in financial services. While explicit guidance on generative AI under DORA is still developing, the direction is clear: AI systems are ICT systems, and DORA applies. The ECB's digital transformation agenda includes AI risk as an emerging supervisory priority.
Institutions that proactively integrate LLM governance into their DORA compliance programme — rather than waiting for explicit regulatory guidance — will be better positioned when supervisory expectations crystallize.
Key Takeaways
- Every LLM deployment is an ICT system under DORA — subject to Art. 7 reliability, Art. 8 registration, Art. 9 security, Art. 24 testing, and Art. 28 third-party risk.
- Hallucination is an Art. 7 reliability failure. Non-deterministic outputs require output validation, confidence scoring, and human-in-the-loop for high-stakes decisions.
- Prompt injection is an Art. 9 security vulnerability. Input sanitization and guardrails are mandatory for any LLM processing external input.
- API-based LLM providers create Art. 28 concentration risk. Map dependencies and require contractual provisions per Art. 30.
- Triple compliance (DORA + AI Act + GDPR) is the reality. Build a unified governance framework, not three parallel programmes.
- DORA provides the infrastructure resilience layer that the AI Act assumes. Both are needed for comprehensive AI governance.
Resume en francais
Les banques europeennes deploient des modeles de langage pour le service client, l'analyse documentaire, la generation de rapports et le developpement logiciel. Chaque deploiement LLM est un systeme TIC sous DORA, soumis a l'article 7 (fiabilite), l'article 8 (inventaire), l'article 9 (securite) et l'article 28 (risque tiers). L'IA generative introduit des risques specifiques : l'hallucination (sorties incorrectes plausibles) viole l'article 7 ; l'injection de prompt viole l'article 9 ; la dependance aux API de fournisseurs LLM cree un risque de concentration sous l'article 28. La triple conformite DORA + AI Act + RGPD est la realite pour l'IA en finance — une gouvernance unifiee est plus efficace que trois programmes paralleles. Ce guide propose un modele de gouvernance GenAI natif DORA en cinq etapes : enregistrement dans l'inventaire d'actifs, evaluation des risques specifiques LLM, controles renforces (validation des sorties, garde-fous d'entree, surveillance specifique), gestion des risques tiers avec strategies de sortie, et programme de tests incluant des tests adversariaux et des tests de regression qualite.