Blog

Hidden in the Pipeline: Why Agentic AI Needs a New Security System

May 28, 2026
Mudit Sinha

Prompt injection occurs when an attacker manipulates an LLM's inputs to make it execute unauthorized commands or bypass intended behaviors. While jailbreaking focuses on tricking a model into ignoring its safety alignment—often driven by a user trying to force a chatbot to generate harmful or restricted content—prompt injection involves inserting malicious instructions into the data stream, typically motivated by a desire to hijack control over the application's processing flow.

AI security has spent the last few years treating prompt injection as a prompt problem. Find the malicious instruction. Detect the jailbreak. Block the suspicious phrase. Add a guardrail around the model. That approach made sense when AI systems behaved like chatbots. A user typed a prompt, the model responded, and the main security boundary was the text entering or leaving the model.

But agentic AI has changed that boundary. Modern AI systems read files, retrieve documents, parse structured data, inspect software artifacts, summarize reports, call tools, and make decisions based on context assembled from many sources. In these systems, the “prompt” is no longer just what a user types. It is the final output of a pipeline. And that pipeline can be weaponized.

Recent controlled simulations show a troubling pattern: modern prompt guardrails can perform well when malicious content is visible as ordinary text, yet fail when the same intent is carried through structured inputs and only becomes model-facing after a transformation step. The issue is not that one detector misses one attack. The deeper issue is that many AI defenses still inspect the wrong boundary. The dangerous content may not look like a prompt when it enters the system. It may look like metadata, a report field, structured input, or a normal artifact moving through a trusted workflow.

That is why agentic AI needs a new security system — one that secures the path to the prompt, not just the prompt itself.

The Prompt is No Longer the perimeter

Most prompt-injection defenses are built around visible text. They inspect strings. They look for instruction overrides. They classify suspicious prompts. They search for jailbreak phrasing, role manipulation, or malicious commands. These controls are useful, but incomplete.

In an agentic workflow, the attacker does not need to attack only the visible prompt. They can attack the system’s input path: a document being summarized, a file being analyzed, a retrieved passage, a tool response, a metadata field, or a structured object that later becomes model-facing context.

This is where indirect prompt injection becomes more dangerous than a chatbot jailbreak. The attacker is no longer only trying to trick the model. The attacker is trying to trick the pipeline that builds the model’s context to hijack the model's reasoning into doing something it was not intended to.

What we simulated against modern guardrails

Controlled simulations tested this attack pattern against representative defenses used in modern AI applications: Llama-family prompt-injection classifiers, NVIDIA-style guardrail architectures, schema gates, regex filters, boundary-prompt defenses, and fine-tuned text classifiers.

These are not weak defenses. Llama Prompt Guard 2 is publicly positioned as a model for detecting prompt injection and jailbreak attacks. NVIDIA NeMo Guardrails is publicly positioned as a framework for orchestrating AI guardrails for agentic applications, including jailbreak prevention, RAG grounding, content safety, and policy controls.

The result was consistent. When malicious content was visible as ordinary text, these defenses performed well. That is exactly what they are designed to do.

But when the same malicious intent moved through a structured input path and only became model-facing after transformation, the guardrails failed in the way that matters most: they did not see the attack at the point where they were inspecting.

This is the critical distinction. The issue is not that popular guardrails are useless. The issue is that they are largely designed to operate at the input and output layers and around visible text inspection. In agentic workflows, the dangerous instruction may not be visible as a prompt. It may be carried through a file, metadata field, tool output, retrieved artifact, or structured object that later becomes part of the model’s context. None of these controls can reliably stop what they cannot observe.

That is why this is not a “better classifier” problem. It is a pipeline security problem.

Why agentic workflows make this worse

In a chatbot, a successful prompt injection may cause a bad answer. In an agentic system, a successful prompt injection can influence action. An agent may retrieve more data, call a tool, update memory, write to a ticketing system, approve a workflow, generate code, summarize a security finding, or pass information to another agent. That changes the blast radius.

Agentic systems often operate through chained transformations:

  • A file becomes parsed content.
  • Parsed content becomes a report.
  • A report when loaded by the model becomes a model context.
  • Model context becomes reasoning.
  • Reasoning becomes a tool call.
  • A tool call becomes an action.

Every one of those transformations can become a security boundary. If defenders inspect only the user prompt, or only obvious text, they are securing only a small part of the actual attack surface. The real perimeter is the full path through which untrusted data becomes model-facing context.

Where this fits in Lineaje’s AI Kill Chain

Lineaje’s AI Kill Chain gives this problem the right framing. The AI Kill Chain is a 10-stage framework for identifying, preventing, and mitigating AI-driven and agentic threats. The stage that matters most here is Stage 3: Instruction & Input Weaponization.

Stage 3 is where adversarial content enters the AI system and becomes capable of influencing the model, which is what leads to Model Reasoning Hijack - AI Kill Chain's Stage 4. In a simple chatbot, that may be a direct malicious prompt. In an agentic application, it may be a document, a tool response, a retrieved passage, a software artifact, a configuration object, a report field, or structured input that later becomes part of the model’s context. This is where hidden intent must be stopped.

If weaponized input is not detected or constrained at Stage 3, it can move into later stages of the AI Kill Chain. The model may reason over it. The agent may pass it to tools. The system may store it in memory. Another agent may inherit it. A human may receive a compromised summary and trust it. That is how a hidden input problem becomes an agentic workflow problem.

Stage 3 defense means securing the path to the prompt

The old security question was: Can we detect a malicious prompt? The new security question is: Can we detect when untrusted data is becoming an instruction? That shift is critical.

A stronger Stage 3 defense needs several layers:

  • Input provenance: Know where content came from: user input, third-party data, retrieved documents, tools, memory, APIs, generated reports, or reconstructed artifacts.
  • Typed validation: Validate structured inputs according to what they claim to be, not only by scanning for suspicious words.
  • Transformation-aware inspection: Inspect content before and after parsing, extraction, reconstruction, summarization, retrieval, and tool handoff.
  • Instruction/data separation: Prevent untrusted data from being silently promoted into trusted instruction context.
  • Policy enforcement before reasoning: Block, downgrade, redact, or sandbox suspicious content before it enters the model’s reasoning environment.
  • Agentic blast-radius control: Prevent a malicious signal from automatically influencing tool calls, memory writes, approvals, workflow state, or downstream agents.

This is the difference between prompt security and pipeline security. Prompt security asks whether the model can resist a bad instruction. Pipeline security asks whether the bad instruction should have reached the model at all—and when it does, can the impact be contained or minimized?

Why Lineaje is Positioned to solve this

Lineaje already approaches security as a supply chain problem: provenance, trust, integrity, policy, visibility, and lifecycle governance. That mindset is exactly what agentic AI now needs. For AI systems, the supply chain is not just software packages and dependencies. It includes prompts, files, retrieved context, tool outputs, model-generated artifacts, memory, APIs, structured data, and every transformation that moves information closer to the model.

Lineaje UnifAI is positioned as an autonomous AI policy orchestrator for secure-by-design agentic AI applications, focused on centralized governance for security, engineering, and GRC teams. Stage 3 of the AI Kill Chain is where that AI supply chain becomes dangerous. It is where passive-looking content can become an active influence.

Lineaje can help organizations break the chain by treating instruction-bearing input as a governed artifact, not just a string to classify. That means mapping where AI systems receive data, identifying which transformations can create model-facing context, enforcing policy before reasoning, and preventing untrusted content from gaining authority inside an agentic workflow. If a prompt injection were to make it past stage 4, Lineaje has additional policies that enforce guardrails in the areas of AI Threats and Exploits, Data Security and Identity and Access Control for tool invocations, data handling and threat mitigation.

The Bottom Line

The next major prompt injection may not look like a prompt. It may look like metadata, a report field, structured data, or a normal artifact moving through a trusted workflow.

That is why the industry needs to stop thinking of prompt injection as only a language problem. In agentic AI, prompt injection is an input-governance problem, a pipeline-integrity problem, and a supply-chain security problem.

The answer is not to abandon guardrails. The answer is to stop treating them as the whole defense. Agentic AI needs a new security system: one that secures the path to the prompt, governs how untrusted data becomes context, and stops instruction weaponization at Stage 3 of the AI Kill Chain.

Because in the agentic workflow age, the prompt is no longer the perimeter. The pipeline is.

More on the blog