analysis

Generative AI Risk Under DORA: Why LLMs Need Operational Resilience Governance

DORA Atlas EditorialFebruary 28, 202612 min read

The LLM Wave Hits Banking

In the 18 months since generative AI became enterprise-ready, European financial institutions have moved from cautious experimentation to production deployment. The use cases are multiplying:

Customer service chatbots that handle routine inquiries, reducing call center volume
Document analysis that extracts structured data from contracts, regulatory filings, and correspondence
Risk report generation that synthesizes data into management-ready narratives
Code generation that accelerates software development for internal systems
Compliance monitoring that scans communications for regulatory violations
Fraud detection that identifies anomalous patterns in transaction narratives

Each of these deployments is an ICT system as defined by DORA. Each must comply with the full range of DORA requirements — from Art. 7 reliability to Art. 8 asset registration to Art. 9 security to Art. 28 third-party risk management. And each introduces a category of risk that traditional ICT governance was not designed to address.

The challenge is not that DORA is insufficient for AI governance. As the previous analysis of AI under DORA argued, DORA's framework maps naturally to AI systems. The challenge is that generative AI specifically — and large language models in particular — introduce risk characteristics that require enhanced governance within the existing DORA framework.

Generative AI Risks That DORA Must Govern

Hallucination: The Reliability Problem

LLMs generate plausible but incorrect outputs — a phenomenon called hallucination. A customer service chatbot that provides incorrect information about an account balance, a document analysis system that fabricates a contract clause, or a risk report generator that includes a statistic that does not exist — each is a reliability failure under Art. 7.

Art. 7 requires ICT systems that are "reliable." Non-deterministic outputs — where the same input can produce different outputs across invocations — challenge the traditional definition of reliability. A core banking system that sometimes adds correctly and sometimes does not would be immediately flagged as defective. An LLM that sometimes summarizes a document accurately and sometimes fabricates content presents the same reliability problem, but the failure mode is harder to detect.

Generative AI Risk	DORA Article	ICT Risk Equivalent	Governance Requirement
Hallucination	Art. 7 — reliability	System producing incorrect outputs	Output validation, confidence scoring, human review
Prompt injection	Art. 9 — protection	Injection attack (SQL, XSS)	Input sanitization, guardrails, monitoring
Training data poisoning	Art. 9 — data protection	Data integrity compromise	Data provenance, validation, access control
Non-deterministic outputs	Art. 7 — reliable systems	Unpredictable system behavior	Deterministic configuration, output logging
Data leakage in prompts	Art. 9 — data protection	Unauthorized data transfer	PII detection, prompt filtering, data classification
Model drift	Art. 7 — maintained systems	Performance degradation	Continuous evaluation, retraining governance
Vendor lock-in (API-based LLMs)	Art. 28 — third-party risk	Critical vendor dependency	Exit strategy, multi-model architecture

Prompt Injection: The Security Problem

Prompt injection is the LLM equivalent of SQL injection. An attacker crafts input that causes the LLM to ignore its instructions and execute unintended behavior — leaking system prompts, accessing data it should not, or producing outputs that bypass guardrails.

In a financial services context, prompt injection against a customer-facing chatbot could expose the institution's internal instructions, system architecture details, or other customers' data. Against a document analysis system, it could cause the system to misclassify a high-risk contract as low-risk.

Art. 9(4)(a) requires authentication and access control. Art. 9(4)(c) requires data protection. Prompt injection bypasses both — it is an access control failure at the application logic layer, not the network or authentication layer.

Third-Party Risk: The Concentration Problem

Most financial institutions deploying generative AI are not training their own models. They are consuming models through cloud provider APIs — OpenAI via Microsoft Azure, Anthropic via AWS Bedrock, Google's models via GCP Vertex AI. Each API-based LLM deployment is an ICT third-party relationship under Art. 28.

The concentration risk is significant. If a bank's customer service chatbot, document analysis system, compliance monitor, and code generation tools all use the same underlying LLM provider, a single provider outage impacts multiple critical and important functions simultaneously. The concentration risk analysis must include LLM provider dependencies.

In this example, Provider A supports three of five AI functions — a concentration that requires Art. 28 risk assessment and documented exit strategies.

European financial institutions deploying generative AI face three overlapping regulatory frameworks:

Requirement	DORA	EU AI Act	GDPR
Risk assessment	Art. 6 — ICT risk management	Art. 9 — risk management system for high-risk AI	Art. 35 — DPIA for high-risk processing
Transparency	Art. 14 — management body reporting	Art. 13 — transparency requirements	Art. 12-14 — information to data subjects
Testing	Art. 24-27 — resilience testing	Art. 9(7) — testing before deployment	Art. 25 — data protection by design
Third-party management	Art. 28-30 — ICT third-party risk	Art. 28 — responsibilities of providers/deployers	Art. 28 — processor agreements
Incident management	Art. 17-23 — ICT incident reporting	Art. 62 — serious incident reporting	Art. 33-34 — breach notification
Human oversight	Art. 5 — management body governance	Art. 14 — human oversight requirement	Art. 22 — automated decision-making rights

The efficient approach is a unified governance framework rather than three parallel compliance programmes. DORA provides the operational resilience layer, the AI Act provides the AI-specific safety layer, and GDPR provides the data protection layer. Each addresses different aspects of the same deployment.

Where DORA Provides Coverage the AI Act Does Not

The AI Act focuses on the AI system's outputs and safety. DORA focuses on the infrastructure underneath — availability, recoverability, third-party dependencies, incident management. An AI chatbot that meets AI Act transparency requirements but runs on a non-resilient infrastructure without tested recovery plans and documented third-party risk assessment fails DORA.

DORA provides what the AI Act assumes: that the ICT infrastructure supporting AI is resilient, recoverable, and governed.

A DORA-Native GenAI Governance Model

Step 1: Asset Registration (Art. 8)

Every LLM deployment must appear in the ICT asset register with:

Model identity: Model name, version, provider, deployment method (API/on-premise)
Criticality classification: Derived from the BIA — what business function does it support, and what is the impact of failure?
Data classification: What data does the model ingest (prompts), process, and output? Are customer PII, financial data, or regulated data involved?
Dependencies: Cloud provider, API endpoints, data pipelines, monitoring systems
Owner: Who is accountable for the model's performance, safety, and compliance?

Step 2: Risk Assessment (Art. 6)

LLM-specific risk assessment within the DORA framework:

Step 3: Enhanced Controls for LLM Deployments

Output validation: Every LLM output in a customer-facing or decision-influencing context must be validated. For numerical outputs (account balances, risk scores), validate against source data. For textual outputs (customer communications, regulatory reports), implement confidence scoring and human review workflows for low-confidence outputs.

Input guardrails: Implement prompt filtering that detects and blocks injection attempts, removes PII from prompts before sending to external APIs, and enforces topic boundaries (preventing the model from discussing topics outside its intended scope).

Monitoring and anomaly detection: LLM-specific monitoring beyond standard observability: prompt/response logging (with PII redaction), output quality metrics (hallucination rate, user correction rate), latency and error rate trends, cost monitoring (API-based LLMs charge per token).

Step 4: Third-Party Risk Management (Art. 28)

For API-based LLM deployments:

Contractual provisions per Art. 30: SLAs covering availability, latency, data processing location, data retention, and model version change notification
Exit strategy: How to migrate from one LLM provider to another if the relationship must end — including prompt engineering portability, fine-tuning data recovery, and service continuity during migration
Concentration risk assessment: Map LLM provider dependencies across all AI use cases and assess the impact of single-provider failure
Data residency: Ensure prompts containing regulated data are processed in compliant jurisdictions (EU data sovereignty for GDPR, specific member state requirements where applicable)

Step 5: Testing (Art. 24-27)

LLM testing extends the standard testing programme:

Adversarial testing: Systematic prompt injection testing, jailbreak attempts, and guardrail bypass testing — analogous to penetration testing for traditional applications
Output quality regression testing: Benchmark test sets that validate model accuracy over time, detecting drift before it impacts users
Resilience testing: What happens when the LLM provider API is unavailable? Does the application gracefully degrade, or does it fail completely?
Recovery testing: Can the institution restore LLM-dependent services within their RTOs? What is the recovery procedure for a model that has been compromised or is producing incorrect outputs?

Supervisory Outlook

The EBA, ESMA, and national competent authorities are closely monitoring AI adoption in financial services. While explicit guidance on generative AI under DORA is still developing, the direction is clear: AI systems are ICT systems, and DORA applies. The ECB's digital transformation agenda includes AI risk as an emerging supervisory priority.

Institutions that proactively integrate LLM governance into their DORA compliance programme — rather than waiting for explicit regulatory guidance — will be better positioned when supervisory expectations crystallize.

Key Takeaways

Every LLM deployment is an ICT system under DORA — subject to Art. 7 reliability, Art. 8 registration, Art. 9 security, Art. 24 testing, and Art. 28 third-party risk.
Hallucination is an Art. 7 reliability failure. Non-deterministic outputs require output validation, confidence scoring, and human-in-the-loop for high-stakes decisions.
Prompt injection is an Art. 9 security vulnerability. Input sanitization and guardrails are mandatory for any LLM processing external input.
API-based LLM providers create Art. 28 concentration risk. Map dependencies and require contractual provisions per Art. 30.
Triple compliance (DORA + AI Act + GDPR) is the reality. Build a unified governance framework, not three parallel programmes.
DORA provides the infrastructure resilience layer that the AI Act assumes. Both are needed for comprehensive AI governance.

Resume en francais

Les banques europeennes deploient des modeles de langage pour le service client, l'analyse documentaire, la generation de rapports et le developpement logiciel. Chaque deploiement LLM est un systeme TIC sous DORA, soumis a l'article 7 (fiabilite), l'article 8 (inventaire), l'article 9 (securite) et l'article 28 (risque tiers). L'IA generative introduit des risques specifiques : l'hallucination (sorties incorrectes plausibles) viole l'article 7 ; l'injection de prompt viole l'article 9 ; la dependance aux API de fournisseurs LLM cree un risque de concentration sous l'article 28. La triple conformite DORA + AI Act + RGPD est la realite pour l'IA en finance — une gouvernance unifiee est plus efficace que trois programmes paralleles. Ce guide propose un modele de gouvernance GenAI natif DORA en cinq etapes : enregistrement dans l'inventaire d'actifs, evaluation des risques specifiques LLM, controles renforces (validation des sorties, garde-fous d'entree, surveillance specifique), gestion des risques tiers avec strategies de sortie, et programme de tests incluant des tests adversariaux et des tests de regression qualite.