Create RAG AI on AWS with OpenSearch

Artificial Intelligence & MLOps

Learn how to build a RAG-based GenAI bot on AWS using OpenSearch Serverless, through our step-by-step example.

Retrieval Augmented Generation (RAG) is a great solution when you want your responses to include supplemental data that wasn’t part of the original LLM training data set or when you want to include data that is rapidly changing. Some of the more common RAG use cases deal with including internal corporate knowledge bases in the LLM to generate more targeted, customized responses.

If you are unfamiliar with GenAI, please view our earlier blog post that covers some of the basic terminology associated with the rapidly evolving field. 

In this post, we will walk through an example showing how to build a RAG-based GenAI bot using OpenSearch Serverless as the vector store. Starting at the beginning we will see how to index data into OpenSearch, how to query that data from OpenSearch, and how to pass all of that data into an LLM for a plain text response.

Step 0 - Create Amazon OpenSearch Serverless (AOSS) 

For this example we will use SAM/CloudFormation, below you will find a very basic template to create the OpenSearchServerless Collection. To create the collection, you need to create a policy for data access, encryption, and network access. For simplicity, we are allowing our SSO user to access the collection from the internet and are using an AWS-supplied key for encryption.

Be sure to update the template below to match your naming convention and to give roles in your specific account access.

Step 1 - Connect to OpenSearch

Before we can do anything else, we need to connect to the collection using our AWS credentials. One simple method is to simply copy the temporary credentials from AWS SSO into the console to run this script and then use boto3 to create the authentication. If you aren’t using AWS, this step may vary.

Step 2 - Create the index

Once we have the connection, we can create our vector index.

There are a few settings in this index explained briefly below

  • Type - knn_vector is the type of index that allows you to perform k-nearest neighbor (k-NN) searches on your data. This is required for vector searches.
  • Dimension - this needs to match your embedding. The OpenAI embedding defaults to text-embedding-ada-002 as of this writing and has 1536 dimensions. Dimensions, in this case, give you more points to search on but also directly relate to the amount of data you are storing and the time it takes to query.
  • Engine - The approximate k-NN library to use for indexing and search. Facebook AI Similarity Search (FAISS) is the engine we are using here. As of this writing, Amazon OpenSearch Serverless Collections only support Hierarchical Navigable Small World (HNSW) (below) and FAISS (you can see in the limitation section on the Amazon docs). Other options in OpenSearch include Non-Metric Space Library (nmslib) and Apache Lucene.
  • Name - This is the identifier for the nearest neighbor method that we are using. Hierarchical Navigable Small World (HNSW), as mentioned above, is the only one supported by Amazon OpenSearch Serverless today. Other options include Inverted File System (IVF) or Inverted File System with Product Quantization (IVFPQ/IVFQ). While HNSW is generally a faster algorithm, it does so at the cost of memory consumption. To learn more, you can take a look at this blog post by AWS on choosing the right algorithm
  • Space type - The space type used to calculate the distance/similarity between vectors. Here, we use l2. There are several other options that get deep into the math of vectors and their relation to each other. Other options include innerproduct, cosinesimil, l1, and linf.

Depending on what options you choose above, you may also have additional settings that you can set to help tune your index for performance, resource consumption, or to optimize for the particular type of data.

Step 3 - Index documents

For a good set of sample documents, we’ll use a large batch of publicly accessible OSHA documents copied into an S3 bucket. For your real-world use case, you might use internal company data, knowledge bases, PDF reports, etc. The possibilities here are endless. In this example, we’ll use LangChain’s S3DocumentLoader to help break up and index the documents, but they also have document loaders for 100+ different sources that can be used to replicate a similar process.

One of the first pieces that we need to configure is the embeddings. Here we are using OpenAI embeddings.

We also need to connect to our OpenSearch collection index that we created using the LangChain OpenSearchVectorSearch. You will notice that we specify the embeddings we are using in this connection so that as we upload documents they will be indexed using those embeddings.

Next, I found all of my S3 documents dynamically.

Then, with each S3 key, I proceeded to use the LangChain S3FileLoader to load the file, split it into chunks, run it through the embeddings, and then load it into the vector store. It looks something like this.

This results in all of our chunks of documents getting uploaded with the chunk of text, the embeddings for determining similarity, and metadata about which document the text was taken from.

At this point, the environment is set up and we can perform queries on it as many times as we want.

Step 4 - Query documents with similarity search

This step is triggered when someone asks a query of your system. The question is processed by the same embeddings you used to load your documents. It then searches through your vector store for similar chunks of text. LangChain abstracts some of this for us and provides us with a function.

First things first, make sure you have a connection to your collection and index using the same embeddings as we did in step 3. This connection can be the exact same as the one in step 3.

Once you have that connection, you can pass your question in. You also indicate which column your vector index is stored in, which column contains your text, and where your metadata is stored. The last two values make up your return data.

Step 5 - Send data to the LLM

Finally, now that we have our question and additional context from our vector store, we can package up and send all of this to the LLM for a response.

The LLM will return a plain text response. We can couple that response with the metadata we pulled from our vector store to not only provide our user with an answer but also link to documents where additional data can be found.


Is your company trying to figure out where to go with generative AI? Consider finding a partner who can help you get there. At Caylent, we have a full suite of generative AI offerings. Starting with our Generative AI Strategy Catalyst, we can start the ideation process and guide you through the art of the possible for your business. Using these new ideas we can implement our Generative AI Knowledge Base Catalyst to build a quick, out-of-the-box solution integrated with your company's data to enable powerful search capabilities using natural language queries. Finally, Caylent’s Generative AI Flight Plan Catalyst, will help you build an AI roadmap for your company and demonstrate how generative AI will play a part. As part of these Catalysts, our teams will help you understand your custom roadmap for generative AI and how Caylent can help lead the way.

Accelerate your GenAI initiatives

Leveraging our accelerators and technical experience

Browse GenAI Offerings
Artificial Intelligence & MLOps
Clayton Davis

Clayton Davis

Clayton Davis is the Director of the Cloud Native Applications practice at Caylent. His passion is partnering with potential clients and helping them realize how cloud-native technologies can help their businesses deliver more value to their customers. His background spans the landscape of AWS and IT, having spent most of the last decade consulting clients across a plethora of industries. His technical background includes application development, large scale migrations, DevOps, networking and technical product management. Clayton currently lives in Milwaukee, WI and as such enjoys craft beer, cheese, and sausage.

View Clayton's articles

Related Blog Posts

OpenAI vs Bedrock: Optimizing Generative AI on AWS

The AI industry is growing rapidly and a variety of models now exist to tackle different use cases. Amazon Bedrock provides access to diverse AI models, seamless AWS integration, and robust security, making it a top choice for businesses who want to pursue innovation without vendor lock-in.

Artificial Intelligence & MLOps

AI-Augmented OCR with Amazon Textract

Learn how organizations can eliminate manual data extraction with Amazon Textract, a cutting-edge tool that uses machine learning to extract and organize text and data from scanned documents.

Artificial Intelligence & MLOps

Building Recommendation Systems Using Generative AI and Amazon Personalize

In this blog, learn how Generative AI augmented recommendation systems can improve the quality of customer interactions and produce higher quality data to train analytical ML models, taking personalized customer experiences to the next level.

Artificial Intelligence & MLOps