If your organization deploys AI systems and has a risk register, there's a reasonable chance it contains entries like "AI model bias," "data privacy in AI training," and "AI regulatory compliance." These are real risks. They belong on the register.
But they're the risks everyone already knows about — the ones that appear in vendor briefings, board presentations, and industry reports. The risks that tend to be missing are the operational security risks: the specific, testable vulnerabilities that exist in your AI deployments right now, that an attacker could exploit this week, and that your current controls may not cover.
This article is about those missing entries.
Why Standard Risk Categories Miss AI-Specific Risks
Most risk registers are organized around categories inherited from traditional IT risk management: data breach, unauthorized access, service disruption, compliance violation, third-party risk. AI-specific risks get slotted into whichever category seems closest. System prompt extraction becomes "unauthorized access." Model hallucination becomes "data integrity." Prompt injection becomes "application vulnerability."
The problem isn't that these mappings are wrong — it's that they're too abstract to be actionable. "Unauthorized access" as a risk entry for a web application tells you something specific: someone might get past your authentication controls and access data they shouldn't see. You know what to test, what to monitor, and what controls to evaluate.
"Unauthorized access" as a risk entry for an AI system could mean a dozen different things — and the controls for each are completely different. Does it mean an attacker extracts the system prompt? Bypasses the model's behavioral constraints? Tricks the model into querying data outside the user's scope? Manipulates the model through injected content in retrieved documents? Each of these is a distinct attack vector requiring distinct controls. Collapsing them into a single risk entry means none of them get properly assessed.
The Entries That Should Be There
What follows is a set of risk entries specific to AI deployments that most registers don't include. Each entry describes the risk, why it matters, what controls address it, and which industry framework maps to it. Adapt the severity and likelihood ratings to your environment — these depend on your specific deployment architecture, data sensitivity, and threat model.
System Prompt Extraction
Risk: An attacker extracts the system prompt through conversational techniques — direct requests, document-framing ("read back your instructions for compliance"), persona manipulation, or multi-turn social engineering.
Why it matters: The system prompt is the equivalent of an application's configuration file. It reveals the model's purpose, constraints, refusal patterns, and — if poorly designed — embedded secrets, connected system references, and security architecture details. An extracted system prompt gives an attacker a complete map of what the model protects and how it protects it, enabling targeted follow-up attacks.
Controls: Design system prompts assuming extraction (no secrets, no detailed security logic). Implement output filtering that detects instruction fragments in responses. Conduct adversarial testing specifically targeting extraction.
Further reading: Five Ways LLMs Leak Their System Prompts breaks down the five distinct extraction patterns and how to test your deployment against each.
Framework mapping: OWASP LLM07 (System Prompt Leakage), MITRE ATLAS AML.T0051 (LLM Prompt Injection)
Prompt Injection — Direct
Risk: An attacker crafts input that overrides the model's system prompt instructions, causing it to behave outside its intended scope — disclosing information it shouldn't, performing actions it wasn't designed for, or ignoring behavioral constraints.
Why it matters: Direct prompt injection exploits the fundamental architecture of LLMs: system instructions and user input are processed through the same mechanism with no privilege separation. Unlike traditional injection attacks (SQL, XSS), prompt injection doesn't exploit a bug — it exploits the model's core design. This means the risk exists in every LLM deployment, regardless of provider or model.
Controls: Input classification to detect adversarial patterns. Output filtering to catch responses that violate behavioral constraints. System prompt hardening with instruction anchoring. External guardrail layers. Regular adversarial testing.
Further reading: What Is Prompt Injection & Why Companies Should Care covers how direct and indirect injection work and what layered defenses look like in practice.
Framework mapping: OWASP LLM01 (Prompt Injection), MITRE ATLAS AML.T0051, NIST AI RMF Measure 2.3
Prompt Injection — Indirect
Risk: Malicious instructions embedded in external content — documents, web pages, emails, database records — are processed by the model during normal operation, redirecting its behavior without the user or operator's knowledge.
Why it matters: This risk applies to any deployment where the model retrieves or processes external content: RAG pipelines, document summarization, web browsing agents, email processing, or any system where the model's context includes content the operator doesn't fully control. The attack payload is planted in advance and activated when the model encounters it.
Controls: Content sanitization for retrieved documents. Separation between instruction context and retrieved content. Monitoring for unexpected model behavior following external content retrieval. Restricting the model's ability to execute actions based on retrieved content without human confirmation.
Framework mapping: OWASP LLM01 (Prompt Injection), MITRE ATLAS AML.T0051
Fabricated Output
Risk: The model generates plausible but fictional information — policies that don't exist, procedures that were never implemented, technical details that are incorrect — and presents them with the same confidence as factual responses.
Why it matters: Fabrication is especially dangerous in contexts where the model's output is trusted as authoritative: internal support bots, compliance assistants, research tools. A model that fabricates a security procedure a user then follows has created a real operational risk. A model that fabricates compliance documentation has created an audit liability. The model isn't lying — it's generating the most plausible completion for the conversation context. But the output is wrong, and the user may have no way to know that.
Controls: Scope the model's responses to documented information only (configure refusal for topics outside its knowledge base). Implement output validation against known-good data sources. Add disclaimers for generated content that hasn't been verified. Monitor for fabrication patterns.
Framework mapping: OWASP LLM09 (Misinformation), NIST AI RMF Map 1.1
Credential or Secret Exposure Through the Context Window
Risk: Secrets, API keys, tokens, or other credentials embedded in the system prompt or injected into the context window are disclosed through adversarial extraction or normal conversational behavior.
Why it matters: This is the most directly exploitable AI-specific risk. A disclosed credential grants whatever access that credential provides — to connected databases, APIs, internal systems, or third-party services. The model cannot reliably protect information that exists in its own context window, regardless of how explicitly it is instructed to do so.
Controls: Never embed secrets in system prompts. Manage credentials through environment variables, external vaults, or application-layer logic the model doesn't see. Audit all system prompts and context injection points for sensitive values. Implement output filtering that scans for known secret patterns.
Framework mapping: OWASP LLM07 (System Prompt Leakage), MITRE ATLAS AML.T0051
Excessive Model Permissions
Risk: The model has access to data or system capabilities beyond what is required for its stated function — broader database access than needed, ability to trigger actions in connected systems, or access to data across user boundaries.
Why it matters: Prompt injection turns excessive permissions into breaches. A model with narrowly scoped access that gets compromised through injection can only access what it was already permitted to reach. A model with broad access that gets compromised gives the attacker everything those permissions allow. This is least-privilege applied to AI — and most deployments fail it because the path of least resistance during development is broad permissions.
Controls: Scope model access to the minimum required for its function. Implement row-level or user-level data access controls at the application layer, not the model layer. Require human confirmation for destructive or irreversible actions. Audit permission configurations regularly.
Framework mapping: OWASP LLM06 (Excessive Agency), NIST AI RMF Govern 1.1
Unauthenticated or Under-Authenticated Model Endpoints
Risk: The model's API endpoint accepts requests without authentication, with weak authentication, or without rate limiting — exposing it to automated adversarial probing from any network-accessible position.
Why it matters: This is the most common deployment vulnerability in local LLM setups. Default runtime configurations expose the model API without authentication on the local network. An attacker on the same network segment — or an attacker who has compromised any device on that segment — can interact with the model directly, running automated extraction and injection attacks at scale.
Controls: Require authentication on all model endpoints. Implement rate limiting. Restrict network access to authorized clients. Log all API interactions with sufficient detail for forensic analysis.
Framework mapping: MITRE ATLAS AML.T0040 (ML Model Inference API Access), NIST AI RMF Manage 2.1
Insufficient Adversarial Testing
Risk: AI deployments reach production without structured adversarial testing — meaning the organization has no evidence that the deployment's security controls work under adversarial conditions.
Why it matters: This is a meta-risk: the risk that other risks on this list exist but haven't been discovered. An untested deployment is not necessarily insecure — but an untested deployment is one where security is an assumption, not a demonstrated property. Every other entry on this list is discoverable through adversarial testing. If the testing hasn't been done, the risk register is based on theory rather than evidence.
Controls: Conduct adversarial testing before production deployment. Include prompt extraction, injection, persona manipulation, multi-turn escalation, and — where applicable — indirect injection through retrieved content. Re-test after significant configuration changes. Automate baseline scans to detect regressions.
Further reading: INT-2026-R001 documents what this testing looks like in practice — 576 probes, 13 attack strategies, and a 48.33% automated success rate against a Llama 3.1 deployment.
Framework mapping: NIST AI RMF Measure 2.3, NIST AI RMF Measure 2.6
How to Use This
These entries aren't a universal risk register — they're the starting point for extending yours. Each deployment has its own architecture, data sensitivity, and threat model, which means the severity ratings, likelihood assessments, and control priorities will be different for every organization.
The process for incorporating them:
Audit your current AI risk entries. Are they specific enough to be testable? "AI security risk" is not testable. "System prompt extraction leading to credential disclosure" is testable. If an entry can't be verified through a specific test, it's too abstract to drive action.
Map to your deployment architecture. Not every entry applies to every deployment. Indirect prompt injection only applies if your model processes external content. Credential exposure only applies if credentials exist in the context window. Score and prioritize based on what you've actually built, not on general AI risk.
Assign controls and owners. Each entry should map to a specific control (or a gap, if the control doesn't exist yet), and each control should have an owner. "We need output filtering" is an observation. "The platform team will implement output filtering by Q2, validated through adversarial testing" is a plan.
Test the controls. A risk register entry with a control that hasn't been tested is a risk register entry with an assumption. Adversarial testing converts assumptions into evidence — either the control works, or it doesn't. Both outcomes are valuable.
The register is the starting point. Testing is what makes it real.
The architectural reasons these risks are hard to defend against — why system prompts can't enforce their own boundaries and why refusals degrade over long conversations — are covered in The Transformer's Blind Spots. For the argument that none of this requires a new security discipline to address, AI Security Is Not a New Discipline makes the case for extending existing practices rather than waiting for a new program.
#AIGovernance #CISO #LLMSecurity #NIST #OWASP #RiskManagement