In our earlier article we explored the concept of agents and agentic workflows, and the advantages of designing your AI application as an agentic workflow. We discussed planning considerations and available tools, with a focus on what AWS provides, both via Bedrock and SageMaker. Now it's time to get building.
This article will outline the steps involved in building an agentic workflow on AWS. We'll focus on leveraging Amazon Bedrock's multi-agent collaboration capability, which allows a "supervisor" agent to orchestrate a team of specialized "collaborator" agents. And once you have your first agentic workflow, we'll discuss where to go next.
Let's get started. Here's what a typical process looks like:
1. Create Agents (Supervisor and Specialists)
The initial phase involves creating the individual AI agents that will form your system. This includes the central orchestrator (the supervisor agent) and the task-specific specialist agents (collaborators).
- Process: For each agent, you'll navigate to the Amazon Bedrock console, select "Agents" from the navigation pane, and then choose "Create Agent."
- Agent Instructions: This is a critical step. You provide clear, natural language instructions that define each agent's purpose, capabilities, persona, limitations, and expected behavior. For example, a supervisor agent's instructions would focus on task decomposition ("Break down user requests into logical steps"), delegation logic ("Route sub-tasks to the appropriate specialist agent"), and result synthesis ("Compile responses from specialists into a coherent final answer"). A specialist agent, like a "ProductRecommendationAgent," would have instructions detailing its focus ("Provide product recommendations based on user query and available inventory data") and the tools it can use. These instructions guide the Foundation Model powering the agent.
- Foundation Model Selection: Choose an appropriate FM for each agent. Amazon Bedrock offers a wide selection. For agents requiring complex reasoning and orchestration (like supervisors), highly capable models such as Amazon Nova Pro or Anthropic's Claude 3.5 Sonnet are good choices. For more narrowly focused specialist agents, you might select a model optimized for cost and performance on specific tasks, like Amazon Nova Lite.
- Permissions (IAM Role): Assign an AWS Identity and Access Management (IAM) role to each agent. This role grants the necessary permissions for the agent to interact with other AWS services, such as AWS Lambda for executing Action Groups, Amazon S3 for accessing Knowledge Base data sources, and invoking the chosen FM for inference. Bedrock can help create a new service role with the necessary base permissions, or you can use an existing, appropriately configured role.
2. Connect Data Stores and Tools (Action Groups & Knowledge Bases for Specialist Agents)
Specialist agents often need to interact with external systems or access specific knowledge to perform their tasks. This is enabled via Action Groups and Knowledge Bases.
Action Groups: These define the tools or APIs the agent can use to perform actions and interact with external systems (e.g., databases, CRM systems, booking APIs, or other enterprise applications).
- API Schema: You must provide an OpenAPI schema (in JSON or YAML format). This schema describes the available API operations, their input parameters, expected output responses, and a description of what each API does. This schema acts as a contract, informing the agent about the tools it has at its disposal and how to use them.
- AWS Lambda Functions for Fulfillment: For each API operation defined in the schema, you associate an AWS Lambda function. This Lambda function contains the actual business logic to execute the action. When the agent decides to use an action, it invokes the corresponding Lambda function, passing the necessary parameters as defined in the OpenAPI schema. The Lambda function then executes the task (e.g., queries a database, calls an external API) and returns a response, which the agent uses to proceed.
Knowledge Bases: These enable agents to perform Retrieval Augmented Generation (RAG) by connecting them to relevant, often proprietary, data sources, grounding their responses in factual information.
- Create Knowledge Base: In Amazon Bedrock, create a Knowledge Base. This involves specifying your data source (e.g., documents in an Amazon S3 bucket, web pages, or data from other services like Confluence or SharePoint via connectors).
- Choose an Embedding Model: Select an embedding model (e.g., Amazon Titan Text Embeddings, Cohere Embed) to convert your textual data into vector embeddings.
- Configure Vector Store: Choose a vector database to store and index these embeddings. Options include Amazon OpenSearch Serverless (which Bedrock can create and manage for you), Amazon Aurora PostgreSQL-Compatible Edition with the pgvector extension, Pinecone, or Redis Enterprise Cloud.
- Data Ingestion and Syncing: Bedrock handles the process of fetching data from your source, splitting it into manageable chunks (if needed), generating embeddings for these chunks using the selected model, and indexing them in the chosen vector store. It can also perform periodic syncs to keep the knowledge base updated.
- Associate with Agent: Link the configured Knowledge Base to the specialist agent. The agent can then query this knowledge base to retrieve relevant information, which is used to augment the FM's prompt, leading to more accurate and contextually grounded responses.
3. Create a Supervisor Agent and Configure Collaboration
The supervisor agent is the central coordinator of the multi-agent system.
Role: It typically interacts directly with the user or the calling application. Its primary responsibilities are to understand the overall request, decompose it into sub-tasks if necessary, delegate these sub-tasks to the appropriate specialist agents, and synthesize their outputs into a final response or plan.
Collaboration Configuration: After creating the supervisor agent (similar to specialist agents, but with orchestration-focused instructions), navigate to its multi-agent collaboration settings within the Amazon Bedrock console. Here, you will:
- Enable the multi-agent collaboration feature.
- Associate the previously created specialist agents as "collaborator agents" to this supervisor.
- Define the collaboration mode. Amazon Bedrock offers options like "Supervisor Mode" (where the supervisor explicitly manages task breakdown and delegation for all requests) or "Supervisor with Routing Mode" (where the supervisor analyzes the request and can route simpler queries directly to a relevant specialist agent, bypassing full orchestration for efficiency, while still orchestrating more complex multi-step requests).
Finally, when moving towards more structured deployments, consider Bedrock's features for lifecycle management. You can create immutable, numbered versions of your agent. An alias (e.g., prod, dev) then points to a specific version. This allows for strategies like blue/green deployments. Before a version can be active, the agent configuration must be validated and compiled using the PrepareAgent API call or console action. This MLOps-like approach, often combined with Infrastructure as Code (IaC) tools like AWS CloudFormation or AWS CDK, provides a robust framework for managing and deploying agentic workflows.
Where to go with Agentic workflows?
Understanding the mechanics of planning, tooling, and building agentic workflows is the first step. The next, and arguably more significant phase, is strategically applying these powerful capabilities to drive tangible business value. Agentic AI opens a vast world of possibilities for application development, but realizing that potential requires careful alignment with organizational goals and an awareness of potential challenges.
Aligning workflow design with goals
While AWS tools like Amazon Bedrock and Amazon SageMaker significantly streamline the development process for agentic AI, the design of workflows that truly drive value remains a nuanced challenge. It's not enough to simply connect models; the architecture must be purposeful and grounded in clear business objectives. Challenges like ensuring reliability and consistency from probabilistic FMs, managing inter-agent dependencies, and maintaining data quality for RAG systems require careful planning and robust design.
Agentic workflows can automate intricate processes, enhance decision-making with AI-driven insights, create highly personalized user experiences, and unlock new operational efficiencies. However, making the most of these technologies requires careful strategization. This involves:
- Identifying High-Impact Use Cases: Pinpoint areas within the business where agentic automation can deliver the most significant improvements or create novel capabilities. For example, an agent that can access real-time inventory data via an Action Group and customer purchase history via a Knowledge Base to provide personalized upselling recommendations in an e-commerce application directly impacts revenue. Another example might be an agent automating complex IT troubleshooting by interacting with monitoring systems and knowledge articles to reduce resolution times.
- Defining Clear Objectives: Establish specific, measurable business outcomes that the agentic workflow is intended to achieve (e.g. reduce customer service handling time by X%, increase sales conversion by Y%, improve accuracy of financial forecasts by Z%).
- Adhering to Best Practices: Designing agentic workflows in accordance with frameworks like the AWS Well-Architected Framework, particularly its pillars of Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization, ensures that the solutions are not only effective but also efficient, secure, and sustainable.
- Iterative Refinement: Treat the development of agentic workflows as an ongoing process. Continuously evaluate their performance against your goals using defined KPIs and user feedback. Use insights, especially those from Bedrock's trace capabilities, to refine agent instructions, prompts, tool integrations, and underlying model choices to maximize their impact.
The true power of agentic workflows on AWS is unlocked when they are deeply integrated with your organization's unique data, systems, and processes. This allows agents to operate with relevant context and execute tasks directly within your existing enterprise environment, transforming them from generic AI tools into highly valuable, specialized assistants.
Caylent Value Proposition
Navigating the complexities of agentic AI, from initial strategy through robust implementation and ongoing optimization, can be a significant undertaking. The challenges of designing effective multi-agent collaboration, ensuring data security and governance, managing costs, and debugging intricate interactions require specialized expertise. Caylent's teams of experts are dedicated to helping organizations like yours plan and implement effective agentic AI pipelines on AWS.
As an AWS Premier Services Partner with deep expertise in generative AI, Caylent can effectively guide you through the agentic development process, leveraging the full potential of AI agents. We help ensure that your agentic workflows are not just technologically impressive but are also strategically aligned with your organization’s core objectives, delivering the maximum possible value. Our approach focuses on building solutions that are scalable, secure, and cost-effective, enabling you to confidently adopt these transformative technologies.
You can explore our generative AI offerings further at Caylent Generative AI on AWS and learn about our strategic approach through initiatives like the AWS Generative AI Strategy Catalyst. Caylent’s guidance ensures that your investment in agentic AI translates into meaningful business outcomes, helping you overcome the inherent challenges and harness the full power of these advanced systems.