Explore Caylent’s Activities at AWS re:Invent

Securing AI Agents on AWS: Designing Autonomous Systems You Can Actually Trust

Generative AI & LLMOps

AI agents represent the next evolution of APIs, but they also bring new security challenges and attack vectors. Examine real-world adversarial threats and learn defensive strategies in this blog.

AWS re:Invent 2025 - Red Team vs Blue Team: Securing AI Agents (DEV317)

AI agents are the next frontier in intelligent automation, but they also represent a new frontier in attack surface complexity. Traditional distributed systems have static trust boundaries. Agents redraw those boundaries on the fly. With autonomy comes opacity, and with opacity comes risk. The question most technology leaders should ask isn’t "Can we build these systems?" but rather, "Can we secure them enough to trust them?"

Here’s the shape of a typical AWS agentic architecture:

Once you break it down architecturally, it looks suspiciously like every other distributed system you’ve ever secured; only now your “business logic” thinks for itself.

Amazon Bedrock has made agentic systems accessible: 

  • Amazon Bedrock Agents orchestrate reasoning and tool use
  • Amazon Bedrock Guardrails enforce safety policies
  • AWS Step Functions brings determinism
  • Amazon Bedrock AgentCore provides observability

While individual services matter, real safety emerges from how they’re designed together. Security is built through architecture, treated as a discipline that shapes every layer of the system.

The Emerging Architecture of Agentic Systems on AWS

A production-ready AI agent on AWS generally resembles a high-trust distributed microservice spanning multiple planes:

  • Cognitive plane: LLM reasoning and contextualization (Amazon Bedrock models).
  • Action plane: Tool integrations (AWS Lambda, AWS Step Functions, or Amazon Bedrock Tools).
  • Memory plane: State and retrieval (Amazon S3, Amazon OpenSearch, Amazon DynamoDB, Pinecone).
  • Control plane: Orchestration and observability (Amazon Bedrock AgentCore, AWS IAM, Amazon CloudWatch, Amazon GuardDuty).

Each plane demands independent controls and unified telemetry. System safety depends not on blocking intelligence but on binding it to context, policy, and accountability.

The Four Domains of Agentic Risk

1. Reasoning Contamination (Prompt Injection)

Malicious or confusing instructions embedded in context data can redirect LLM behavior.

Mitigation requires multilayered validation, starting at content ingestion rather than at model inference. Prompt injection is the SQL injection of the LLM world. You can sanitize inputs, but reasoning itself is the parser. Attackers can hide malicious instructions in text, PDFs, or even Jira comments. Once indexed, those instructions become part of the model’s memory.

How to defend (AWS-style):

  • Use Amazon Bedrock Guardrails for intent classification and malicious content filtering.
  • Add regex and AST validation layers to AWS Lambda functions to sanitize all user input.
  • Chain Meta Llama Guard or Nvidia NeMo Guardrails models inline before invoking your main Amazon Bedrock endpoint. This gives your system a multi-layer defence while preserving vendor and model neutrality.
  • Isolate Retrieval-Augmented Generation (RAG) ingestion via Amazon S3 Object Lambda transforms to strip unwanted instructions on ingestion.

Rule of thumb: Treat every document like executable code — validate before trusting.

2. Action Overreach (Tool Poisoning)

Every function exposed to an agent doubles as an API endpoint open to abuse. Never hand a reasoning system unconstrained keys. An agent with wildcard credentials can do more damage than a naive intern on day one. Every tool an agent can call is, in practice, an API surface that can be driven by arbitrary inputs (including hostile or misaligned ones), and if that surface is backed by over‑privileged credentials, misuse quickly escalates into real-world damage.

How to defend:

  • Scoped AWS IAM roles for each tool; never share credentials.
  • Use an agentic gateway (Amazon API Gateway + AWS Lambda) to filter parameter formats before hitting APIs.
  • Schema validation: Define explicit function schemas in Amazon Bedrock AgentCore (no “free-text commands”).
  • Log and audit tool activity via AWS CloudTrail and Amazon Detective to detect unusual spikes in automation.

3. Workflow Escalation (Agent-to-Agent Lateral Movement)

Agents calling agents sounds powerful until one starts impersonating another. It is dangerous for AI agents to call each other because each agent becomes both an attack surface and a high‑trust authority for the others, so one compromised or misaligned agent can rapidly escalate into system‑wide data leakage and unsafe actions.

Hardening strategy:

  • Build deterministic chaining with AWS Step Functions by defining explicit sequences rather than open-ended calls.
  • Issue short-lived AWS STS tokens between services for identity-bound invocation.
  • Limit event routing with Amazon EventBridge filtering rules so one agent can’t whisper to another without your consent.

Remember: Inter-agent communication should look like a transaction log, not a chat room.

4. Knowledge Base Corruption (Supply Chain Poisoning)

RAG pipelines have become the new software supply chain. Every time your agent indexes new documents, it’s editing its brain. Poison just one file, and you’ve corrupted its reasoning logic. Research shows that corrupting even a small number of source documents can introduce backdoors. In Anthropic's study, they found that “malicious actors can inject specific text into these posts to make a model learn undesirable or dangerous behaviors.

Mitigation pipeline:

  1. Amazon S3 upload eventAWS Lambda ingestion validator. The Lambda can perform custom checks and validations.
  2. Scan with Amazon GuardDuty Malware Protection. This system scans uploaded files for known vulnerabilities and malware.
  3. Validate through Amazon Bedrock Guardrails (toxicity and prompt injection check). These guardrails provide an industry-standard suite of checks.
  4. Hash content with SHA-256 and record to Amazon DynamoDB. Aids in the traceability of the content
  5. Index only after cryptographic verification. Only ingest content that has not been modified since being scanned.

This makes ingestion auditable and reversible, like version control for cognition. 

How Autonomous is your Workflow?

While autonomous agents are the shiny new toy, not every workflow needs to be autonomous. Most need to be controlled, observable, and interruptible. While some workflows are dynamic (think phone trees), many, if not most, follow a fixed set of steps. 

For example:

  1. Look up a patient ID.
  2. Get patient medications.
  3. Check for refills. 

This is not a workflow that needs autonomy. In fact, autonomy would be very bad here. Imagine that the look-up-patient-id agent prioritizes returning some ID over returning just the correct ID? It might decide that a “close” match to a patient's name is acceptable and return the ID of the wrong patient. When in doubt, go deterministic: define exactly how your agent flows from insight to action. In these cases, AWS Step Functions may be a far more appropriate solution. Ask yourself if you’d prefer a pharmacist to methodically perform the required steps (check patient ID, check for medication interactions, verify dosage) or do a “flexible” or “creative” approach before dispensing your pills.

The AWS Security Fabric for Autonomous Systems

AWS now offers a complete ecosystem for secure autonomy, but the onus is on architects to use it coherently.

Security Goal
AWS Capability
Architectural Guidance

Intent validation

Amazon Bedrock Guardrails

Preprocess and classify all context inputs before inference.

Deterministic flow

AWS Step Functions

Encode reasoning-to-action sequences explicitly.

Least privilege

AWS IAM Roles / AWS STS

Limit scope per tool, per agent, per invocation.

Threat detection

Amazon GuardDuty, Amazon Detective, AWS Security Hub

Correlate LLM and system-level anomalies.

Data authenticity

Amazon S3 Object Lambda + Hashing + Amazon DynamoDB

Verify every document and prevent silent poisoning.

Observability

Amazon Bedrock AgentCore + Amazon CloudWatch + AWS X-Ray

Trace reasoning and tool activity end-to-end.

When combined, these controls transform autonomous systems from opaque black boxes into auditable components of your application ecosystem.

Observability is the New Governance

Agentic security doesn’t end with prevention; it depends on visibility. Amazon Bedrock AgentCore Observability now enables real-time introspection of LLM calls, action sequences, token usage, and latencies — bridging the gap between inference and intent.

Combine that telemetry with AWS Security Hub, Amazon CloudWatch metrics, and AWS X-Ray traces to create your organization's first true feedback loop for cognitive systems.

This is how agent governance becomes measurable. You move from “We hope it’s safe” to “We can prove it’s safe.”

Strategic Imperatives for Leaders

Developers are excited about agentic systems, but their enthusiasm needs to be tempered with security realities to avoid potentially disastrous outcomes.

  1. Architect for failure. When Werner Vogels said “everything fails,” he didn’t mean just infrastructure. Assume every input is adversarial and every integration is abusable. Pretend every input is a telemarketer trying to scam your grandmother.
  2. Enforce least privilege at reasoning granularity. AWS IAM now extends past infrastructure into thought. Every LLM calling an agent and every agent calling a tool is exercising privilege; give them the least freedom possible.
  3. Codify determinism. Replace autonomy sprawl with explicit state machines. Start with the assumption that your workflow is deterministic and only allow autonomy where it is required.
  4. Instrument everything. We used to say, “If you don’t measure quality, how do you know if you have any?” Visibility is the prerequisite to trust.
  5. Invest in red/blue agent testing. Treat adversarial reasoning as a security discipline on par with pen testing. Ask yourself: if we let agents loose with testing, what could go wrong?

Take these steps and your organization can safely embrace agentic systems.

Conclusion

Security stabilizes innovation by enabling it to move faster and with confidence.

The organizations that master secure autonomy will iterate faster precisely because they can delegate more safely. AWS has given builders the primitives such as Amazon Bedrock Guardrails, AWS Step Functions, Amazon Bedrock AgentCore, AWS IAM, and Amazon GuardDuty to make that possible.

It’s now up to us to use them as engineering artifacts, not afterthoughts. AI agents will become teammates, not tools. But like every teammate, they need guardrails, accountability, and context. Build them that way, and they’ll earn your trust.

How Caylent Can Help

If you’re looking to modernize your cloud security strategy for an agentic, AI-driven future, Caylent can help. As an AWS Security Consulting Partner of the Year, we work with organizations to design and secure architectures that balance autonomy with trust—embedding guardrails, observability, and least-privilege controls from day one. Whether you’re securing AI agents, modernizing legacy cloud environments, or rethinking governance at scale on AWS, our experts are ready to help you move faster with confidence. Reach out to Caylent to see how we can strengthen your security foundation while accelerating innovation.

Generative AI & LLMOps
Brian Tarbox

Brian Tarbox

Brian is an AWS Community Hero, Alexa Champion, has ten US patents and a bunch of certifications, and ran the Boston AWS User Group for 5 years. He's also part of the New Voices mentorship program where Heros teach traditionally underrepresented engineers how to give presentations. He is a private pilot, a rescue scuba diver and got his Masters in Cognitive Psychology working with bottlenosed dolphins.

View Brian's articles

Learn more about the services mentioned

Caylent Catalysts™

AWS Generative AI Proof of Value

Accelerate investment and mitigate risk when developing generative AI solutions.

Caylent Catalysts™

Generative AI Strategy

Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.

Accelerate your GenAI initiatives

Leveraging our accelerators and technical experience

Browse GenAI Offerings

Related Blog Posts

POC to PROD: Hard Lessons from 200+ Enterprise Generative AI Deployments, Part 2

Discover hard-earned lessons we've learned from over 200 enterprise GenAI deployments and what it really takes to move from POC to production at scale.

Generative AI & LLMOps

POC to PROD: Hard Lessons from 200+ Enterprise Generative AI Deployments, Part 1

Explore hard-earned lessons we've learned from 200+ enterprise GenAI deployments.

Generative AI & LLMOps

A Comprehensive Guide to LLM Evaluations

Explore how organizations can move beyond traditional testing to build robust, continuous evaluation systems that make LLMs more trustworthy and production-ready.

Generative AI & LLMOps