January 3, 2026Intrenex8 min read

AI Security Is Not a New Discipline

Organizations are treating AI security as something entirely new — a problem that requires new teams, new frameworks, and new thinking from scratch. It doesn't. The principles are the same. The attack surface is different.

AI safetyLLM securityCISOcybersecurityrisk management
Share

There's a narrative forming around AI security that goes something like this: AI systems are fundamentally different from traditional software, existing security practices don't apply, organizations need entirely new teams and frameworks and budgets to address AI-specific risks, and until all of that is in place, the problem is too novel to tackle.

This narrative is wrong in a way that's actively harmful — because it gives organizations permission to do nothing while they wait for a new discipline to mature. The discipline already exists. It's called security engineering. The principles haven't changed. What's changed is the attack surface.

The Principles You Already Have

If you run a security program at any scale, you already know how to think about the problems AI introduces. The vocabulary is different. The underlying logic is the same.

Least privilege. An application should only have access to the data and systems it needs to perform its function. This applies identically to an LLM deployment. If your model is configured as a customer support bot, it shouldn't have query access to your entire customer database — it should have access to the authenticated user's records and nothing else. The principle isn't new. The implementation is different because the model accepts natural language input that can be crafted to request data outside its intended scope.

Defense in depth. No single control should be the only thing standing between an attacker and a breach. You wouldn't protect a web application with just input validation and no WAF, no authentication, no logging. An LLM deployment that relies entirely on system prompt instructions for security has exactly one layer of defense — and it's a layer that can be bypassed through conversation. The principle is the same: stack controls so that failure of any single layer doesn't result in compromise.

Assume breach. Design systems with the assumption that some component will eventually be compromised. For traditional infrastructure, this means network segmentation, detection capabilities, and incident response plans. For an LLM deployment, this means designing the system prompt with the assumption it will be extracted, managing secrets outside the model's context window, and monitoring model outputs for indicators of adversarial interaction. The attacker model is different. The defensive posture is identical.

Trust boundaries. Every system has boundaries where trust changes — where data crosses from a controlled environment to an uncontrolled one, or where user input enters a processing pipeline. In an LLM deployment, the trust boundary is the context window: the point where user messages and system instructions are combined and processed by the same mechanism with no privilege separation. This is conceptually identical to the trust boundary at a web application's input layer. The defense is also conceptually identical: validate inputs, filter outputs, and don't trust the processing layer to enforce its own constraints.

Monitoring and detection. You monitor network traffic for anomalous patterns. You monitor application logs for injection attempts. Monitoring LLM interactions follows the same logic: look for patterns that indicate adversarial probing — repeated requests for system information, persona manipulation attempts, multi-turn escalation toward sensitive topics. The signals are different. The practice of watching for them isn't.

What's Actually New

Saying the principles are the same is not saying the implementation is trivial. There are genuine differences in how these principles apply to AI systems, and ignoring those differences is as dangerous as treating everything as novel.

The input is unstructured. Traditional applications accept structured input — form fields, API parameters, database queries — that can be validated against schemas. An LLM accepts natural language, which means the boundary between legitimate use and adversarial input is ambiguous in ways that structured input validation doesn't address. You can't write a regex to catch social engineering.

The processing is opaque. When a traditional application processes a request, you can trace the execution path — which functions were called, which database queries were executed, what logic determined the response. When an LLM processes a request, the reasoning happens inside a neural network that doesn't expose its decision path. You see the input and the output. The processing in between is a statistical operation across billions of parameters. This makes debugging security failures harder and makes output filtering more important.

Instructions and data share the same channel. In a traditional application, code and user data are separate. SQL injection exists precisely because that separation can be violated — but the separation is the default, and injection is the exception. In an LLM, there is no separation. System instructions and user messages are processed through the same attention mechanism. Prompt injection isn't a bug. It's a consequence of the architecture. The "injection" is the default state.

Defenses are probabilistic. A firewall rule either blocks traffic or it doesn't. A system prompt instruction might be followed or it might not, depending on the conversation context, the phrasing of the request, and the accumulated conversational history. Security controls that depend on the model's compliance are inherently probabilistic — which means they need to be backed by deterministic external controls to be reliable.

These differences are real. They require new tools, new testing methodologies, and new monitoring approaches. But they don't require new principles. They require applying existing principles to a new kind of system.

Where to Start

If you're a security leader looking at your organization's AI deployments and wondering where to begin, start with what you already know.

Inventory. You maintain an inventory of your applications, infrastructure, and data flows. Add AI deployments to that inventory. Where are models running? What data do they access? What input channels do they accept? Who can interact with them? This is the same asset management discipline you already practice.

Threat model. You threat-model your applications before deployment. Apply the same process to AI systems. What are the trust boundaries? What's the most sensitive data the model can access? What would the impact be if the model's instructions were extracted or its behavior was redirected? What are the most likely attack vectors? If you've ever run a threat modeling exercise, you can run one for an AI deployment.

Test. You penetration-test your web applications and infrastructure. Adversarial testing for AI systems follows the same logic: probe the system the way an attacker would, document what breaks, and fix it before production. The specific techniques are different — prompt injection instead of SQL injection, social engineering through conversation instead of phishing — but the practice of testing adversarially is something your organization already understands.

Monitor. You monitor your production systems for indicators of compromise. Extend that monitoring to AI interactions. Multi-turn escalation patterns, repeated system prompt probing, persona manipulation attempts — these are the AI equivalents of the network anomalies your SOC already watches for.

Govern. You have policies for acceptable use, data handling, and incident response. Extend them to cover AI-specific scenarios. What happens when a model leaks sensitive information? What's the process for evaluating the security of a new AI deployment before it goes to production? Who is accountable for the security posture of AI systems?

None of this requires a dedicated AI security team. It requires your existing security team to understand AI-specific attack surfaces and extend their current practices to cover them. The expertise gap is real but narrow — it's learning a new attack surface, not learning a new discipline.

The Danger of Waiting

The worst outcome of the "AI security is a new discipline" narrative is paralysis. Organizations conclude that they can't address AI security until they hire an AI security team, build an AI security program, and adopt AI-specific security frameworks.

Meanwhile, their AI deployments are in production. The models are running. The API endpoints are exposed. The system prompts contain secrets they shouldn't. The attack surface exists today, and it doesn't wait for the security program to catch up.

The alternative: treat AI security as an extension of what you already do. Start with the principles you already have. Apply them to the systems you're already running. Test the deployments you've already made. You'll close more gaps in a week of extending existing practices than in six months of building a new program from scratch.

The discipline isn't new. The urgency is.


For a concrete example of what adversarial testing looks like in practice, INT-2026-R001 documents a structured assessment of a Llama 3.1 deployment — the methodology, the findings, and what each gap reveals about defense posture. For teams extending their risk management practices to cover AI-specific threats, What Your AI Risk Register Is Missing provides the specific entries most registers omit, mapped to OWASP, MITRE ATLAS, and NIST AI RMF.

#CyberSecurity #AISafety #LLMSecurity #CISO #AIGovernance

Interested in the methodology?

Explore the lab environment and tools used to conduct these adversarial simulations.

Explore the Lab