Data Preparation
For the dataset I am using the topical chat dataset that contains conversations between pairs of agents. Specifically I used the “train.json” file and kept the conversation id and the conversations only.
User Intent Identification using Claude Function Calling
There are 3 main components required here. These include:
- Functions (Tools) available to be called for fulfilling a request,
- Definition of the tools in form of XML tags.
- Function calling logic customized to work with Claude on Bedrock
Tools
First we need to create all the essential functions necessary for Claude to address user inquiries effectively. Notably, functions such as `get_summary_instruction()`, `get_sentiment_instruction()`, and `get_general_instruction()` play a crucial role in directing user queries towards pertinent instructions tailored for the LLM. It is imperative to observe the inclusion of XML tags, as they are requisite within the Claude framework. Please take a moment to familiarize yourself with the provided code. Each function has a description that helps LLM to decide which one to use based on the user question.
In this particular example I am specifying three instructions. The first one is to answer questions about summarizing the conversation, the next will specify the instruction to retrieve the sentiment of the conversation and the last one is for any other question. In other words the chatbot is optimized to answer questions about conversation summary and sentiment while it can still be used for any other question. Below you can review the detailed instructions for each scenario. You will need to add any specific instruction (ex. formatting, step by step guide, few-shot examples, …) here to be picked up by the LLM accordingly when answering the user question.
The last tool here is to retrieve the conversation history based on the conversation id passed in the user question. In Particular, it reads the conversation history from a JSON file on the disk and returns it along with the question and the selected instruction.
The final step requires listing all the tools available to LLM.
Function Calling
Having established the requisite tools, this section will outline the prompt template for the function calling purpose. The prompt template consists of four primary sections, commencing with the definition of the system role, followed by the Claude-required function template and an exhaustive list of available tools. Crucially, the subsequent instructions direct Claude to initially identify the relevant instruction from the available options before proceeding with additional function calls. This approach ensures the prompt is augmented with any necessary instructions, thus facilitating the desired response.
Solution in action!
Now we have everything needed to test our solution. I have asked 3 questions about conversation summary, conversation sentiment and one general question which is not about sentiment or summary of the conversation. Let's try them and note the end to end process.
Question about summary of the conversation:
Here I ask the following question from Claude: “Can you summarize the conversion with id t_bde29ce2-4153-4056-9eb7-f4ad710505fe?
”
And here are the run logs:
We can confirm that the final response has correctly picked up our instruction to summarize the topics discussed in the conversation and they are presented in a numbered list.
Question about sentiment of the conversation:
Here I ask the following question from Claude: “What is the sentiment of the conversation with id t_bde29ce2-4153-4056-9eb7-f4ad710505fe?
”
And here are the truncated run logs:
As you can see the response incorporated our instruction to summarize sentiment of the conversation in the predefined categories.
General question about the conversation:
Here I ask the following question from Claude that is not about the sentiment or the summarization of the conversation. “how many turns of conversation exists in conversation with id t_bde29ce2-4153-4056-9eb7-f4ad710505fe?
” This showcases the scenario where LLM has correctly used the general instruction given the question is neither about “conversation summary” nor “conversation sentiment”.
And here are the run logs:
User Intent Identification using Bedrock Agent
Now let's see how we can achieve the same thing using a Bedrock agent. There are 3 main components required here. This includes:
- Tools available to our agent to fulfill a request,
- APIs that our agent can invoke in a JSON or YAML file following the OpenAPI schema,
- Action group in the form of a AWS Lambda function which the agent will use to invoke APIs based on its reasoning
Agent Tools
Let’s take a quick look at the tools I defined for our agent. Note the function docstring where I provided the goal of each tool which will help our agent to decide which one to use to solve a problem.
OpenAPI Schema
Now we define the APIs to invoke by agent in the following JSON file following OpenAPI schema. You will need to upload this file to an S3 bucket and point to it when defining the agent. As before we have 3 APIs for the instruction specific to the user question namely summary, sentiment or any other question. And last API is to pass the user question, selected instruction by LLM and the conversation history to LLM to make the final response. Notably in each path you need to provide the description, input parameters and the output. Below the schema for the APIs is listed.
Action group
Finally I have implemented the Lambda function that receives the event from the Agent and invokes relevant tools using the APIs we defined earlier.
Ideally the LLM must first call instruction APIs to choose the right instruction and then invoke the “get_conversation” API to fetch the conversation history given the conversation id and consequently create the final response. In order to make sure the agent follows the exact order you need to provide the instructions when defining the agent. Here are the instructions I provide to the agent:
Agent in action
Let’s test our agent with the same questions we used to test our function calling logic. I will showcase one question here but you can test it with other 2 questions to verify the desired behavior of our agent.
User Question
Here I ask the following question from my agent: “what is the sentiment of the conversation with id t_bde29ce2-4153-4056-9eb7-f4ad710505fe?
”
First event
In the first call our agent correctly invoked '/get_sentiment_instruction'
API from the available options given the user question which is about the sentiment of the conversation.
Second event
In the second call agent has successfully identified '/gen_conversation'
API after collecting all the required info and populating the parameters (question, instruction, and conversation id). After invoking the API the conversation history is retrieved along with other required info to invoke Claude endpoint.
Final Call
In the final step the agent has invoked Claude model to create the final answer as presented in the below screenshot. On AWS Console you can also show the traces of steps the agent is taking, review the prompt it is using as well as its reasoning to fulfill the user’s request. The agent has successfully identified the two available tools Agent-AWS-API::genConversation and Agent-AWS-API::querySentimentInstruction to answer the questions.
Conclusion
This post presented a creative method for intent identification within chatbot applications, harnessing the power of Bedrock agent and Anthropic Claude's function calling capabilities on Bedrock. One of the main observation we had between the two methods is that Agents will abstract away a lot of things for us such as creating the prompt, parsing output, adding the tools to the agent’s toolbox, and more importantly the iteratively invoking LLM unlike the function calling method in which you had to implement those logics yourself. Moreover, the ability to define tools as APIs based on OpenAPI schema in the form of JSON or YAML files makes the process more structured and a better option for production.
We demonstrated a way to take the complexity of authoring the best prompt for the task away from the user and move it inside the application logic. This potentially can lead to dynamic and responsive chatbots that excel in understanding and fulfilling user intents, ultimately enhancing the user experience within conversational interfaces. You can find the code for this post here.