Deploying GenAI Applications on AWS: Best Practices

March 1, 2024

Generative AI & LLMOps

Gain a strategic overview of all the factors you need to consider when planning for GenAI development in your organization.

In our previous overview of LLMOps architecture and lifecycle, we shared a target state, fully mature vision for managing solutions incorporating Large Language Models (LLMs). Since the concepts, tools, and skill sets required to run GenAI workloads at that level of maturity can be new to teams, the vision may seem daunting at first. An incremental approach to change management allows organizations to pace adoption based on need and value.

This article outlines a few key strategic progressions in the journey to operationalize GenAI on AWS at a high level. Future articles will go deeper into technology in specific areas.

We recommend AWS well-architected practices generally and the Machine Learning Lens specifically be used as guideposts to best practices, with an emphasis on security. Most business use cases will take advantage of enterprise data using retrieval-augmented generation (RAG). If your use case doesn't involve RAG, you can skip over our data recommendations.

A number of the recommendations that follow depend on agility. Given the evolving nature of GenAI models, tools, patterns, and best practices, architectural modularity reduces the risk of lock-in from early design decisions. Making platform choices like Amazon Bedrock or Amazon SageMaker AI that support multiple models or using LangChain or another flexible framework to abstract solution components are two ways to ensure that lessons learned in testing are less likely to take projects all the way back to the drawing board.

Whether your team is embarking on their first GenAI experiments or preparing to release into a highly regulated production environment at scale, we have recommendations based on best practices and our experience over the last year plus.

Experimentation

Draft Policies

Define initial technical policies for use of GenAI so that your teams know what data can be used, which models or services are acceptable, and which privacy settings you require. On AWS, services like Amazon Bedrock explicitly protect against using your data to train future models, while other services have varying approaches. Highly sensitive data can be used with GenAI, but those use cases should start with our Highly Regulated Business recommendations for a more inclusive and stricter governance and security posture.

Set Business Goals

Even experiments should have a directional business goal to guide them so that you can measure progress, success, or lessons learned. Specific KPIs can be defined later. Open-ended, pure R&D is a topic we'll cover later that's more appropriate for more advanced teams that have already demonstrated success.

Start With the Data You Have

Don't wait for perfect data to get started. Many of the traditional data quality hurdles no longer apply with GenAI. Since you aren't typically asking an LLM to calculate the precise average of a column, having half the values null or zero doesn't matter. Poor structuring also isn't nearly as big of an issue, nor is using a field for multiple purposes, as the models can usually be prompted to figure that out. If your first experiment lacks even adequate data, move quickly to the next priority.

Start With a Best-in-Class LLM

Avoid premature cost optimization by choosing the best model available to validate functionality. As of this writing, our teams are having good success with Claude 2.1 on Bedrock, with the model's improvements resulting in lower prompt engineering efforts and higher quality outcomes.

Be Cost Conscious

Begin measuring cost and projecting value from day one. Forecasting the unit cost of each experiment will determine whether it should proceed toward production or stay in the rapid prototyping phase. This can be "back of napkin" quality math using volume and velocity assumptions since it's meant to avoid large financial misalignments before increasing visibility and generating excitement. Resources should be tagged granularly enough to track back to individual experiments if using AWS Cost Explorer as a validation. This can also help guide you on whether to expose your stakeholders to the best-in-class (and more expensive) LLM early in the process or find the most cost-effective model that might meet their needs.

Embrace Rapid Prototyping

Plan for and embrace obstacles as learning opportunities. For example, poor initial responses from an LLM can often be solved by rapid iterations of prompt engineering. Build some slack into your schedules to account for uncertainty.

Track Experiments

GenAI can be non-deterministic, involving a fair amount of experimentation and adjustment. Experimenters should track model, parameter, data, and prompt changes to allow backtracking and variable cross-pollination. Dedicated tooling isn't necessary. Excel or Git are perfectly fine at this stage! A prompt catalog tool may be useful for larger teams to coordinate. Introducing a benchmarking workflow for LLMs can also be helpful at this phase to incorporate human or automated evaluation of experiments.

Initial Internal Use Cases

Internal audiences are typically the first place where GenAI initiatives have real business impact. While these users can be more understanding of experimentation and iteration than external customers, workloads are starting to scale and need to be operated well.

Monitor Cost

Put a budget and corresponding alarms in place to avoid unintended surprises in your monthly bill. Tag experiments individually to track their performance. Ideally, define and monitor business KPIs that will allow you to measure ROI.

Monitor Usage

Capture as much data as practical, from system prompt, user prompt, model, configuration, input data, outputs, and user sentiment to facilitate reproducibility. Regularly review sentiment and feedback into an enhancement backlog.

Expand Governance

The initial policies and guidance should expand to acceptable use of GenAI, including prompt etiquette and human validation of outputs prior to use. While internal systems follow usage monitoring guidelines, it may also be useful to adopt third-party governance tools to provide guardrails around any public models being used.

Evaluate Data Needs

Is data limiting the use case's performance? Evaluate the ROI for improving data quality, recency, breadth, etc, and make a business case for data investment.

Initial External Use Cases

External users raise the stakes in a number of dimensions, and first impressions really do matter. Ensure a great first impression and a sustainable business through a few best practices.

Monitor Cost Closely

Cost consciousness is a recurring theme. As your solutions are exposed to potentially much larger target audiences, the chances for surprise overruns increase. Define and closely monitor your business KPIs so that you can calculate ROI continuously. Avoid consumption that isn't correlated with revenue unless it's tightly constrained. Think seriously about exposing GenAI-powered features to non-authenticated users.

Automate Alignment Testing

Is your GenAI solution responding appropriately every time? Whether simple sentiment analysis through a service like Amazon Comprehend, a more advanced service like Guardrails for Amazon Bedrock that automates input prompt and response filtering, and/or a complementary custom AI model evaluating responses for accuracy and company tone, automating alignment testing can ensure that end users' experience matches your intention.

Capture Sentiment

User satisfaction is a critical signal that can be used to improve your GenAI solution. A simple thumb up/down is a low-friction modality for your users to help you meet their expectations. These sentiments can then be used to measure solution performance and serve as an objective guide for A/B testing.

Enhance Usage Monitoring

Capture the results of alignment testing and sentiment feedback in your monitoring. Add detailed technical performance tracking and connect AI usage with desired business outcomes directly as often as possible.

Systematically Track Change

Awareness of the exact configuration that was running at a point in time is indispensable in tracking down issue reports as well as analyzing technical or business performance changes. Ensure any test or production releases have traceability across all factors involved.

Externalize Governance

Explain your use of GenAI to your end users. Some use cases may want to highlight AI-generated content specifically, others may be served by a broad policy statement.

Highly Regulated Businesses

The guidance so far relates to companies that are working with relatively unconstrained data. If your organization is entrusted with highly regulated data such as healthcare (HIPAA/HITRUST), payment (PCI), personally identifiable information (PII), or similar, your freedom to experiment is predicated on your team's ability to work within policy guidance.

Start With Great Governance

Nail down governance up front, in detail. Consider ISO/IEC 42001 as a framework for your policies. Are transparency, explainability, traceability, or bias detection required? Some of this effort will involve relatively simple extensions of existing policies for data handling, but GenAI's non-determinism brings new challenges to system validation that may require new techniques and processes. Wherever possible, automated guardrails should be preferred to approval gates.

Invest Early in Change Tracking

Change tracking maturity will need to meet a higher standard from the beginning of your GenAI journey, including early experimentation. Invest up front in prompt, system parameter, code, and configuration versioning as well as operational monitoring for traceability. Technical updates should always trace back to business requirements and justification.

Automate Guardrails

Guardrails for Amazon Bedrock or a similar capability should be used to automate input prompt safety and response filtering. Testing should include a "red team" approach to actively try to break whatever guardrail solution is chosen.

Advanced / Scaled Use of GenAI

As GenAI delivers positive ROI and starts to become ubiquitous across product teams, additional investment can be justified to maximize its impact.

Standardize Monitoring

Monitoring standards will enable your organization to proactively manage cost, outcome performance, and system performance with consistency. Automate continuous bias and drift detection capabilities to enable proactive responses.

Optimize

Optimization levers abound and are very situational, but Caylent has made a few recommendations repeatedly. Replace large, generalized LLM integrations with smaller generative models or analytical AI solutions that are more efficient and/or specialized to the task. Self-host models where the operational overhead is less than the projected savings. Use lower cost, custom silicon like Graviton or Inferentia. Use batch inference, if possible. Leverage response caching for common questions.

Act Autonomously

Enable your AI solution to take action based on user intent through frameworks like LangChain's agents, or services such as Agents for Amazon Bedrock. This opportunity may materialize as an end user-facing feature that delights, or an internal workflow accelerator that saves time and toil.

Tune Frequently

Apply model fine-tuning to increase performance, efficiency, and alignment. Capture, monitor, and act on end user feedback continuously to identify tuning opportunities. Continuously fine-tune your models where data or outcomes are drifting. Improve the relevancy of context documents used in RAG use cases through vector store tuning.

Add A/B Testing

As with software best practices, invest in A/B testing to improve outcomes by pitting models, parameters, and prompts against each other. Architectural modularity pays off in facilitating A/B testing, as system components are designed to be easily swapped out.

Formalize Model Evaluation

Now that your team is familiar with the characteristics of several LLMs, accelerate the early stages of projects by shifting your model optimization left. Use tools like Model Evaluation on Bedrock, Clarify FM Evaluation, or similar to get a head-start on optimization.

Embrace Unstructured Research and Development

One of the first recommendations we made in this article was to set business goals to guide experiments. Advanced organizations may choose to break that rule and invest in pure R&D. This might include following along with the latest research papers, playing with a data hypothesis, or brainstorming and testing blue sky ideas. Outcomes here won't be assured, but can be transformative.

In Summary

While there are many factors that contribute to operational maturity and success with GenAI, the optimistic news is that almost all of them can be adopted iteratively and incrementally with appropriate foresight. This can be comforting. Given competitive pressures, the one option that's almost always unacceptable is to remain in the status quo and wait to see how things play out.

If you're looking for help strategizing how to operationalize your GenAI initiatives, Caylent can help. Our Generative AI Flight Plan Caylent Catalyst can help you build an AI roadmap for your organization and demonstrate how generative AI can positively impact you. If you have a specific GenAI vision, we can also tailor an engagement exactly to your requirements. Get in touch with our team to explore opportunities to innovate with AI.

Generative AI & LLMOps

Khobaib Zaamout

Dr. Khobaib Zaamout is the Principal Architect for AI Strategy at Caylent, where his main focus lies in AIML and Generative AI. He brings a solid background with over ten years of experience in software, Data, and AIML. Khobaib has earned a master's in Machine Learning and holds a doctorate in Data Science. His professional journey also involves extensive consulting, solutioning, and leadership roles. Based in Chestermere, Alberta, Canada, Khobaib enjoys a laid-back life. Outside of work, he likes cooking for his family and friends and finds relaxation in camping trips to the Rocky Mountains.

View Khobaib's articles

Mark Olson

Mark Olson, Caylent's Portfolio CTO, is passionate about helping clients transform and leverage AWS services to accelerate their objectives. He applies curiosity and a systems thinking mindset to find the optimal balance among technical and business requirements and constraints. His 20+ years of experience spans team leadership, technical sales, consulting, product development, cloud adoption, cloud native development, and enterprise-wide as well as line of business solution architecture and software development from Fortune 500s to startups. He recharges outdoors - you might find him and his wife climbing a rock, backpacking, hiking, or riding a bike up a road or down a mountain.

View Mark's articles

Accelerate your GenAI initiatives

Leveraging our accelerators and technical experience

Browse GenAI Offerings

Integrating MLOps and DevOps on AWS

From notebooks to frictionless production: learn how to make your ML models update themselves every week (or earlier). Complete an MLOps + DevOps integration on AWS with practical architecture, detailed steps, and a real case in which a Startup transformed its entire process.

Analytical AI & MLOps

Infrastructure & DevOps Modernization

Generative AI & LLMOps

October 30, 2025

Jumpstart Your AWS Cloud Migration

Learn how small and medium businesses seeking faster, more predictable paths to AWS adoption can leverage Caylent's SMB Migration Quick Start to overcome resource constraints, reduce risk, and achieve cloud readiness in as little as seven weeks.

Migrations

Generative AI & LLMOps

October 17, 2025

Evolving MultiAgentic Systems

Explore how organizations can evolve their agentic AI architectures from complex multi-agent systems to streamlined, production-ready designs that deliver greater performance, reliability, and efficiency at scale.

Generative AI & LLMOps

View all blog posts