re:Invent 2024

Getting Started with Agentic AI on AWS

Generative AI & LLMOps

Whether you're new to AI agents or looking to optimize your existing solutions, this blog provides valuable insights into everything from Retrieval-Augmented Generation (RAG) and knowledge bases to multi-agent orchestration and practical use cases, helping you make informed decisions about implementing AI agents in your organization.

This blog explores the various components and architectures that have emerged in this rapidly evolving domain, including Retrieval-Augmented Generation (RAG), knowledge bases, and tools. It delves into the intricacies of agent systems, discussing both single-agent and multi-agent approaches, and examines the delicate balance between comprehensive information gathering and token efficiency. The paper also covers key features such as prompt routing, caching, and guardrails, providing insights into their implementation and benefits. By analyzing these developments, we aim to provide a comprehensive overview of the current state of LLM interactions and their potential future directions.

The various options are discussed in turn and then code samples for each example are presented in the Appendix.

How has agentic infrastructure evolved?

The evolution of Large Language Model (LLM) interactions has seen rapid advancements in recent years. Initially, developers made direct calls to LLMs, providing prompts and receiving generated responses. This approach, while powerful, was limited by the model's training data cutoff. The introduction of Retrieval-Augmented Generation (RAG) marked a significant leap, allowing LLMs to access external, up-to-date information to enhance their responses. As RAG gained traction, the process was further streamlined with the development of managed knowledge bases and tools, automating the retrieval and integration of relevant information. The latest advancements have introduced sophisticated capabilities like Multi-Agent Orchestration, where multiple specialized AI agents collaborate to tackle complex tasks, and Inline Agents, which offer dynamic, runtime configuration of AI assistants. These developments have dramatically expanded the scope and flexibility of LLM applications, enabling more intelligent, context-aware, and adaptable AI systems.

As with most designs/architectures in our field agents are balancing two competing pressures. Pressure one is the desire to have the most complete information so as to be able to generate the most accurate and helpful results. Pressure two is the desire to avoid token explosion (and its attendant higher cost and higher chance of hallucination) by providing too much information to the model. The evolution of agent and multi-agent approaches are attempts to balance these design pressures while simultaneously avoiding excess complexity in the actual software.

Depending on the business problem to solve, the Agent system might be a single agent or be multi-agent (sometimes abbreviated MA). In Multi-Agent Systems a group of agents, specialized in a specific tasks, work together to complete a full workflow.

The latest advancements have introduced sophisticated capabilities like Multi-Agent Orchestration, where multiple specialized AI agents collaborate to tackle complex tasks, and Inline Agents, which offer dynamic, runtime configuration of AI assistants. These developments have dramatically expanded the scope and flexibility of LLM applications, enabling more intelligent, context-aware, and adaptable AI systems.

Multi-Agent Orchestration for example can have multiple levels: a Orchestrator Agent selecting from multiple Supervisor Agents, each of which is selecting one or more work Agents, each of which might use one or more knowledge bases and/or tools. This level of complexity can lead to difficulty with observability and evaluation.

What is an AI agent?

An AI agent is essentially just a technology system that can decide, act, and learn without constant human interaction - i.e. it is semi or fully autonomous. The system is composed of both ML/AI models and traditional software components. Typically, AI agents are used to complete specific tasks or workflows and/or do analysis and drive decisions that achieve business goals. AI Agents are typically programmed around a specific objective or set of objectives.

Most Caylent customers (although not all) will likely build agents leveraging Amazon Bedrock Agents. An Amazon Bedrock Agent consists of several key components that work together to enable complex, multi-step task execution and intelligent interactions. The main components of an agent are:

Foundation Model (FM)

The agent uses a selected foundation model as its core reasoning engine. This FM is responsible for:

  • Interpreting user requests
  • Breaking down tasks into logical steps
  • Generating responses and follow-up actions

The FM is the component that most people associate with any GenAI system, even thought it is actually only one component of such a system.

Prompts

Developers provide instructions that define the agent's purpose and guide its behavior. These instructions act as a prompt to the FM, describing what the agent is designed to do and how it should interact. These instructions can be either a System Prompt or a User Prompt. System prompts define the general behavior of the system, and are often lengthy and highly detailed, containing comprehensive instructions on how the AI should process information and generate responses. They may include rules for citation, formatting, and even personality traits. System Prompts are often not visible to the end user but can be modified when creating a client agent.

User prompts are typically shorter and more straightforward, ranging from simple questions to more complex requests for analysis or content creation. They can vary in complexity but are generally more focused on a specific task or topic.

Action Groups / Tools

An Action Group is a component that defines specific tasks an agent can perform to assist users. It serves as a bridge between the agent's natural language understanding capabilities and the execution of concrete actions or API calls. Action Groups are composed of a set of actions, typically defined using an OpenAPI schema, and an action executor, usually implemented as an AWS Lambda function.

See Action Groups / Tools for more details

Knowledge Bases

Knowledge bases provide additional context and information to supplement the agent's responses. They allow the agent to:

  • Access and query relevant data sources
  • Perform Retrieval Augmented Generation (RAG) to enhance accuracy
  • Augment responses with domain-specific knowledge

See Knowledge Base for more details

Memory

Agents have both short-term and long-term memory capabilities:

  • Short-term memory retains detailed information relevant to the current conversation
  • Long-term memory stores important facts and summaries from previous interactions

Prompt Templates

Customizable prompt templates allow developers to fine-tune the agent's behavior at different stages of its operation, including:

  • Pre-processing
  • Orchestration
  • Knowledge base response generation
  • Post-processing

By combining these components, Amazon Bedrock Agents can orchestrate complex workflows, interact with enterprise systems, and provide intelligent, context-aware responses to user queries.

Advanced applications for agents

Knowledge Base

An Amazon Bedrock Knowledge Base is a fully managed capability that enables the implementation of Retrieval Augmented Generation (RAG) workflows for generative AI applications. It serves as a crucial component in enhancing the responses of foundation models (FMs) by providing contextual information from an organization's private data sources. Knowledge Bases are essentially vector databases created from source documents, allowing specialized company-specific data to be made available to the Large Language Model (LLM).

To create a Knowledge Base in Amazon Bedrock, users first need to prepare their data source, which can be unstructured (such as documents in an S3 bucket) or structured (like databases in Amazon Redshift or AWS Glue Data Catalog). When setting up the Knowledge Base, users select an embedding model to convert their data into vector embeddings and choose a vector store to index these embeddings. Amazon Bedrock can automatically create and manage a vector store in Amazon OpenSearch Serverless, simplifying the setup process. The following databases may serve as the vector store for a knowledge base: Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL, MongoDB Atlas, Pinecone and Redis Enterprise Cloud.

Once created, a Knowledge Base can be utilized through various operations provided by Amazon Bedrock. The Retrieve operation allows users to query the Knowledge Base and retrieve relevant information, while the RetrieveAndGenerate operation goes a step further by using the retrieved data to generate appropriate responses. For structured data sources, Amazon Bedrock Knowledge Bases can convert natural language queries into SQL queries, enabling seamless integration with existing data warehouses. This capability extends the reach of generative AI applications to include critical enterprise data without the need for extensive data migration or preprocessing.

The use of Knowledge Bases in Amazon Bedrock offers several advantages for building generative AI applications. It provides a fully managed RAG solution, automating the end-to-end workflow from data ingestion to retrieval and prompt augmentation. This approach not only improves the accuracy and relevance of AI-generated responses but also enhances transparency through source attribution, helping to minimize hallucinations. Furthermore, Knowledge Bases support multimodal data processing, allowing applications to analyze and leverage insights from both textual and visual data, thereby expanding the scope and capabilities of AI-powered solutions. 

See Knowledge Base Definition for a sample of creating a Knowledge Base.

Action Groups / Tools

Toots in the context of AI and AI-agentic systems are software that allow the AI system to perform certain defined, deterministic tasks such as interacting with external systems or running scripts. They serve as extensions to the model's capabilities, allowing it to perform tasks such as querying databases, making API calls, or accessing real-time information. Developers provide JSON schemas describing each tool's functionality and input requirements. A tool is basically the OpenAPI definition of a function call.

When creating an Action Group, developers specify the parameters that the agent needs to elicit from users to carry out actions. For example, an Action Group for a hotel booking system might include functions like "CreateBooking," "GetBooking," and "CancelBooking," each with its own set of required parameters such as check-in date, number of nights, or booking reference. The Bedrock agent uses this configuration to determine what information it needs to gather from the user through conversation. Once the necessary details are collected, the agent invokes the associated Lambda function, which contains the business logic to execute the action, such as interacting with backend systems or external APIs. This modular approach allows for flexible and extensible agent capabilities, enabling developers to create sophisticated AI assistants that can perform a wide range of tasks based on natural language inputs. 

See Action Group Definition for an example of defining an Action Group.

Prompt Flow

Prompt Flow is a graphical user interface to design the orchestration of agents (similar to the Step Function design tool). A prompt flow consists of a name and description, a set of permissions, a collection of nodes and connections between nodes.

There are the following node types available:

  • Input Node: Serves as the entry point for the flow, receiving initial data.
  • Output Node: Acts as the exit point, returning the final result of the flow.
  • Prompt Node: Defines a prompt to use in the flow, either from Prompt Management or inline.
  • Knowledge Base Node: Queries a knowledge base to retrieve relevant information.
  • Agent Node: Utilizes an AI agent to perform complex tasks.
  • Lambda Function Node: Executes custom logic or interacts with external systems.
  • S3 Storage Node: Interacts with Amazon S3 for data storage or retrieval.
  • Condition Node: Directs flow based on specified conditions.
  • Iterator Node: Applies subsequent nodes iteratively to array elements.

InlineAgent

Inline Agents are a capability to dynamically specify a set of knowledge bases and/or tools to respond to a request. Rather than hand coding a workflow or creating a Prompt Flow one can let an InlineAgent determine what to do.

The InvokeInlineAgent API in Amazon Bedrock determines which knowledge bases and tools to use through a dynamic and intelligent selection process based on the user's input and the agent's configuration. This process allows for flexible and context-aware responses. Here's how it works:

Dynamic Selection

1. Analysis of User Input: The inline agent analyzes the user's query to understand the context and requirements.

2. Configuration Evaluation: It evaluates the provided configuration in the API call, which includes:

  •  Action groups
  •  Knowledge bases
  •  Instructions

3. Relevance Matching: The agent matches the query against the available resources to determine which are most relevant.

Selection Criteria

  • Action Groups: The agent selects appropriate action groups based on the tasks required to fulfill the user's request.
  • Knowledge Bases: It chooses relevant knowledge bases that contain information pertinent to the query.
  • Instructions: The agent follows the provided instructions to guide its decision-making process.

Example Scenario

When a user asks about a specific topic:

  1. The agent analyzes the query.
  2. It selects the most appropriate action group (e.g., ClaimManagementActionGroup for a claim-related query).
  3. It chooses the relevant knowledge base (e.g., claims documentation).
  4. The agent configures itself on the fly with the selected tools and knowledge.

This dynamic approach allows the inline agent to:

  • Provide focused and relevant responses
  • Adapt to different types of queries within a single conversation
  • Efficiently use resources by only accessing necessary information

By intelligently selecting the right combination of knowledge bases and tools for each query, the InvokeInlineAgent call ensures optimized performance and accuracy in its responses.

Multi-Agent Orchestration

Multi-agent orchestration is an advanced approach to building complex AI systems that leverages multiple specialized agents working together to solve intricate problems and execute multi-step tasks. This collaborative framework enhances the capabilities of individual AI agents by combining their strengths and expertise.

Key Components

Supervisor Agent 

A central agent that coordinates the overall workflow by:

  • Breaking down complex tasks into manageable subtasks
  • Delegating work to specialized agents
  • Consolidating outputs from various agents

Specialist Agents 

Multiple AI agents with specific areas of expertise, designed to handle particular aspects of a given problem.

Inter-Agent Communication: A standardized protocol allowing agents to exchange information and coordinate their actions efficiently.

Benefits

  • Enhanced Problem-Solving: Tackles complex, multi-step tasks more effectively than single-agent systems
  • Improved Accuracy: Combines specialized knowledge from multiple agents
  • Increased Efficiency: Enables parallel processing of subtasks
  • Scalability: Allows for the addition of new specialized agents as needed

Related Features

Prompt Routing

Amazon Bedrock Intelligent Prompt Routing is a feature that optimizes the use of foundation models (FMs) within a model family to enhance response quality while managing costs. This capability, currently in preview, offers a single serverless endpoint for efficiently routing requests between different foundational models. As such the name is quite misleading as it sounds very similar to Prompt Flow but is by contrast strictly a cost reduction feature.

Key Features

  • Dynamic Model Selection: The system predicts the performance of each model for every incoming request, choosing the one that's likely to provide the best response at the lowest cost.
  • Model Family Support: During the preview phase, users can select from preconfigured routers for the Anthropic and Meta model families.
  • Cost Optimization: Intelligent Prompt Routing can reduce costs by up to 30% without compromising on accuracy.
  • Performance Improvement: By leveraging multiple models' strengths, it can enhance overall performance for various tasks.

How It Works

  • Model Family Selection: Users choose the model family they want to use (e.g., Anthropic's Claude or Meta's Llama).
  • Request Analysis: For each incoming request, the system predicts the performance of specified models within the chosen family.
  • Optimal Model Selection: Amazon Bedrock dynamically selects the model predicted to offer the best combination of response quality and cost.
  • Request Processing: The chosen model processes the request and returns the response.

Prompt Caching

As with all caching systems this feature is based on the notion that some requests will be popular and made multiple times.  Since LLMs requests can be expensive and slow this is a method to reduce both cost and latency.

Guardrails and Eval/Feedback

Guardrails can be used in multiple ways to help safeguard generative AI applications. For example:

  • A chatbot application can use guardrails to help filter harmful user inputs and toxic model responses.
  • A banking application can use guardrails to help block user queries or model responses associated with seeking or providing investment advice.
  • A call center application to summarize conversation transcripts between users and agents can use guardrails to redact users’ personally identifiable information (PII) to protect user privacy.

Amazon Bedrock Guardrails supports the following policies:

  • Content filters – Adjust filter strengths to help block input prompts or model responses containing harmful content. Filtering is done based on detection of certain predefined harmful content categories - Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack.
  • Denied topics – Define a set of topics that are undesirable in the context of your application. The filter will help block them if detected in user queries or model responses.
  • Word filters – Configure filters to help block undesirable words, phrases, and profanity (exact match). Such words can include offensive terms, competitor names, etc.
  • Sensitive information filters – Configure filters to help block or mask sensitive information, such as personally identifiable information (PII), or custom regex in user inputs and model responses. Blocking or masking is done based on probabilistic detection of sensitive information in standard formats in entities such as SSN number, Date of Birth, address, etc. This also allows configuring regular expression based detection of patterns for identifiers.
  • Contextual grounding check – Help detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.
  • Image content filter – Help detect and filter inappropriate or toxic image content. Users can set filters for specific categories and set filter strength.

Feedback

In the context of Amazon Bedrock Agents, several feedback mechanisms can be employed to enhance the agent's performance and accuracy. These mechanisms allow for continuous improvement and adaptation of the agent's responses based on various inputs and evaluations.

Prompt Modification

One of the primary feedback mechanisms in Bedrock Agents is prompt modification. This technique involves adjusting the base prompt templates to fine-tune the agent's behavior and responses.

Base Prompt Templates

Bedrock Agents come with four default base prompt templates:

  1. Pre-processing
  2. Orchestration
  3. Knowledge base response generation
  4. Post-processing (disabled by default)

By modifying these templates, developers can enhance the agent's accuracy and tailor its behavior to specific use cases.

Use Case Discussion

Every use case is different but we may be able to generalize a few types of use cases. One way of thinking about use cases is to ask: does the use case require a step by step workflow?  

An example of such a workflow might be:

  1. Look up my medical record id
  2. Get a list of my current prescriptions
  3. Refill one of the prescriptions

In this example there are several distinct steps, each of which requires a specialist agent/tool to complete and which need to be completed in a specific order.

Another use case might be a Town information retrieval system where a person could ask for Building Code information, town committee meeting minutes, or the hours for the town dump. In this use case there might be three knowledge bases and given appropriate KB descriptions an InLineAgent could determine which KB to query.

A combined use case might be the above information retrieval scenario followed by a request to apply for a building permit. This case might involve MAO with one of the agents being an Inline Agent and another agent having a Tool to interface with building permit application API.

Another combined use case might be a user asking what was the most expensive AWS service they were using followed by a request for recommendations to reduce that cost. This might be an MAO calling a tool to get customer specific pricing information followed by an InlineAgent knowledge base search for service specific recommendations.

At a certain level there will need to be business logic someplace in the application. That logic can live in raw Python code, in the description of your agents, knowledge bases and tools, and/or in your design of an Orchestration/Supervisor/Agent hierarchy. It could even be in the creation of a number of distinct applications or APIs, i.e. the concept of a library of functions that a user or programmer stitched together.

A corollary of this point is that just as an organization needs to have clean data before implementing GenAI, it also needs to determine what its business logic is or should be.

How Caylent Can Help

Do you want to evaluate Agentic AI use cases for your organization? Caylent's experts can help you navigate the complexities of AI implementation, from selecting the right models, deploying scalable architecture and building custom solutions with Amazon Bedrock and AWS's AI suite. Contact us today to explore how you can deploy AI systems that deliver real business value with innovative new capabilities while maintaining cost efficiency.

Appendix A Code Samples

Simple Invocation

Invoke an Agent With a Knowledge Base

Invoking the Inline Agent API - with external definition of tools

There are two Python files.  The first is the main program which invokes the InlineAgent API.  The second file defines the action groups and tools being supplied to the api.

MultiAgent Orchestration

Agent Definition file

Main Program

Orchestration Done Completely Manually

This code shows that one can perform agent orchestration completely manually using simple Python without using an specialized agents or frameworks.  While this code appears quite simple it is also very fragile.  Any change in the orchestration requires changes to the python code.  Any additional or conditional changes require a rewrite.  This is an example of easy demo-ware that likely does not scale.

Implementing a Feedback Mechanism

Here is the main program, 

Utility functions

Creating a Robust Guardrail

Knowledge Base Definition

Action Group Definition

Generative AI & LLMOps
Brian Tarbox

Brian Tarbox

Brian is an AWS Community Hero, Alexa Champion, runs the Boston AWS User Group, has ten US patents and a bunch of certifications. He's also part of the New Voices mentorship program where Heros teach traditionally underrepresented engineers how to give presentations. He is a private pilot, a rescue scuba diver and got his Masters in Cognitive Psychology working with bottlenosed dolphins.

View Brian's articles

Learn more about the services mentioned

Caylent Catalysts™

Generative AI Strategy

Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.

Caylent Catalysts™

Generative AI Knowledge Base

Learn how to improve customer experience and with custom chatbots powered by generative AI.

Caylent Services

Artificial Intelligence & MLOps

Apply artificial intelligence (AI) to your data to automate business processes and predict outcomes. Gain a competitive edge in your industry and make more informed decisions.

Caylent Catalysts™

AWS Generative AI Proof of Value

Accelerate investment and mitigate risk when developing generative AI solutions.

Accelerate your GenAI initiatives

Leveraging our accelerators and technical experience

Browse GenAI Offerings

Related Blog Posts

Whitepaper: The Transformative Potential of Generative AI in Healthcare: A Clinician’s Perspective

Generative AI & LLMOps

How We Utilize AI at Caylent

At Caylent, we're using generative AI across all aspects of our business, from accelerating and improving internal workflows, to offering more innovative, tailored solutions to our customers.

Generative AI & LLMOps

Amazon Q Developer: Transform Use Cases

See all the ways that Amazon Q’s Developer: Transform can help you migrate and modernize your data system.

Generative AI & LLMOps