Caylent Catalysts™
AWS Generative AI Proof of Value
Accelerate investment and mitigate risk when developing generative AI solutions.
Gain a strategic overview of all the factors you need to consider when planning for GenAI development in your organization.
In our previous overview of LLMOps architecture and lifecycle, we shared a target state, fully mature vision for managing solutions incorporating Large Language Models (LLMs). Since the concepts, tools, and skill sets required to run GenAI workloads at that level of maturity can be new to teams, the vision may seem daunting at first. An incremental approach to change management allows organizations to pace adoption based on need and value.
This article outlines a few key strategic progressions in the journey to operationalize GenAI on AWS at a high level. Future articles will go deeper into technology in specific areas.
We recommend AWS well-architected practices generally and the Machine Learning Lens specifically be used as guideposts to best practices, with an emphasis on security. Most business use cases will take advantage of enterprise data using retrieval-augmented generation (RAG). If your use case doesn't involve RAG, you can skip over our data recommendations.
A number of the recommendations that follow depend on agility. Given the evolving nature of GenAI models, tools, patterns, and best practices, architectural modularity reduces the risk of lock-in from early design decisions. Making platform choices like Amazon Bedrock or Amazon SageMaker that support multiple models or using LangChain or another flexible framework to abstract solution components are two ways to ensure that lessons learned in testing are less likely to take projects all the way back to the drawing board.
Whether your team is embarking on their first GenAI experiments or preparing to release into a highly regulated production environment at scale, we have recommendations based on best practices and our experience over the last year plus.
Define initial technical policies for use of GenAI so that your teams know what data can be used, which models or services are acceptable, and which privacy settings you require. On AWS, services like Amazon Bedrock explicitly protect against using your data to train future models, while other services have varying approaches. Highly sensitive data can be used with GenAI, but those use cases should start with our Highly Regulated Business recommendations for a more inclusive and stricter governance and security posture.
Even experiments should have a directional business goal to guide them so that you can measure progress, success, or lessons learned. Specific KPIs can be defined later. Open-ended, pure R&D is a topic we'll cover later that's more appropriate for more advanced teams that have already demonstrated success.
Don't wait for perfect data to get started. Many of the traditional data quality hurdles no longer apply with GenAI. Since you aren't typically asking an LLM to calculate the precise average of a column, having half the values null or zero doesn't matter. Poor structuring also isn't nearly as big of an issue, nor is using a field for multiple purposes, as the models can usually be prompted to figure that out. If your first experiment lacks even adequate data, move quickly to the next priority.
Avoid premature cost optimization by choosing the best model available to validate functionality. As of this writing, our teams are having good success with Claude 2.1 on Bedrock, with the model's improvements resulting in lower prompt engineering efforts and higher quality outcomes.
Begin measuring cost and projecting value from day one. Forecasting the unit cost of each experiment will determine whether it should proceed toward production or stay in the rapid prototyping phase. This can be "back of napkin" quality math using volume and velocity assumptions since it's meant to avoid large financial misalignments before increasing visibility and generating excitement. Resources should be tagged granularly enough to track back to individual experiments if using AWS Cost Explorer as a validation. This can also help guide you on whether to expose your stakeholders to the best-in-class (and more expensive) LLM early in the process or find the most cost-effective model that might meet their needs.
Plan for and embrace obstacles as learning opportunities. For example, poor initial responses from an LLM can often be solved by rapid iterations of prompt engineering. Build some slack into your schedules to account for uncertainty.
GenAI can be non-deterministic, involving a fair amount of experimentation and adjustment. Experimenters should track model, parameter, data, and prompt changes to allow backtracking and variable cross-pollination. Dedicated tooling isn't necessary. Excel or Git are perfectly fine at this stage! A prompt catalog tool may be useful for larger teams to coordinate. Introducing a benchmarking workflow for LLMs can also be helpful at this phase to incorporate human or automated evaluation of experiments.
Internal audiences are typically the first place where GenAI initiatives have real business impact. While these users can be more understanding of experimentation and iteration than external customers, workloads are starting to scale and need to be operated well.
Put a budget and corresponding alarms in place to avoid unintended surprises in your monthly bill. Tag experiments individually to track their performance. Ideally, define and monitor business KPIs that will allow you to measure ROI.
Capture as much data as practical, from system prompt, user prompt, model, configuration, input data, outputs, and user sentiment to facilitate reproducibility. Regularly review sentiment and feedback into an enhancement backlog.
The initial policies and guidance should expand to acceptable use of GenAI, including prompt etiquette and human validation of outputs prior to use. While internal systems follow usage monitoring guidelines, it may also be useful to adopt third-party governance tools to provide guardrails around any public models being used.
Is data limiting the use case's performance? Evaluate the ROI for improving data quality, recency, breadth, etc, and make a business case for data investment.
External users raise the stakes in a number of dimensions, and first impressions really do matter. Ensure a great first impression and a sustainable business through a few best practices.
Cost consciousness is a recurring theme. As your solutions are exposed to potentially much larger target audiences, the chances for surprise overruns increase. Define and closely monitor your business KPIs so that you can calculate ROI continuously. Avoid consumption that isn't correlated with revenue unless it's tightly constrained. Think seriously about exposing GenAI-powered features to non-authenticated users.
Is your GenAI solution responding appropriately every time? Whether simple sentiment analysis through a service like Amazon Comprehend, a more advanced service like Guardrails for Amazon Bedrock that automates input prompt and response filtering, and/or a complementary custom AI model evaluating responses for accuracy and company tone, automating alignment testing can ensure that end users' experience matches your intention.
User satisfaction is a critical signal that can be used to improve your GenAI solution. A simple thumb up/down is a low-friction modality for your users to help you meet their expectations. These sentiments can then be used to measure solution performance and serve as an objective guide for A/B testing.
Capture the results of alignment testing and sentiment feedback in your monitoring. Add detailed technical performance tracking and connect AI usage with desired business outcomes directly as often as possible.
Awareness of the exact configuration that was running at a point in time is indispensable in tracking down issue reports as well as analyzing technical or business performance changes. Ensure any test or production releases have traceability across all factors involved.
Explain your use of GenAI to your end users. Some use cases may want to highlight AI-generated content specifically, others may be served by a broad policy statement.
The guidance so far relates to companies that are working with relatively unconstrained data. If your organization is entrusted with highly regulated data such as healthcare (HIPAA/HITRUST), payment (PCI), personally identifiable information (PII), or similar, your freedom to experiment is predicated on your team's ability to work within policy guidance.
Nail down governance up front, in detail. Consider ISO/IEC 42001 as a framework for your policies. Are transparency, explainability, traceability, or bias detection required? Some of this effort will involve relatively simple extensions of existing policies for data handling, but GenAI's non-determinism brings new challenges to system validation that may require new techniques and processes. Wherever possible, automated guardrails should be preferred to approval gates.
Change tracking maturity will need to meet a higher standard from the beginning of your GenAI journey, including early experimentation. Invest up front in prompt, system parameter, code, and configuration versioning as well as operational monitoring for traceability. Technical updates should always trace back to business requirements and justification.
Guardrails for Amazon Bedrock or a similar capability should be used to automate input prompt safety and response filtering. Testing should include a "red team" approach to actively try to break whatever guardrail solution is chosen.
As GenAI delivers positive ROI and starts to become ubiquitous across product teams, additional investment can be justified to maximize its impact.
Monitoring standards will enable your organization to proactively manage cost, outcome performance, and system performance with consistency. Automate continuous bias and drift detection capabilities to enable proactive responses.
Optimization levers abound and are very situational, but Caylent has made a few recommendations repeatedly. Replace large, generalized LLM integrations with smaller generative models or analytical AI solutions that are more efficient and/or specialized to the task. Self-host models where the operational overhead is less than the projected savings. Use lower cost, custom silicon like Graviton or Inferentia. Use batch inference, if possible. Leverage response caching for common questions.
Enable your AI solution to take action based on user intent through frameworks like LangChain's agents, or services such as Agents for Amazon Bedrock. This opportunity may materialize as an end user-facing feature that delights, or an internal workflow accelerator that saves time and toil.
Apply model fine-tuning to increase performance, efficiency, and alignment. Capture, monitor, and act on end user feedback continuously to identify tuning opportunities. Continuously fine-tune your models where data or outcomes are drifting. Improve the relevancy of context documents used in RAG use cases through vector store tuning.
As with software best practices, invest in A/B testing to improve outcomes by pitting models, parameters, and prompts against each other. Architectural modularity pays off in facilitating A/B testing, as system components are designed to be easily swapped out.
Now that your team is familiar with the characteristics of several LLMs, accelerate the early stages of projects by shifting your model optimization left. Use tools like Model Evaluation on Bedrock, Clarify FM Evaluation, or similar to get a head-start on optimization.
One of the first recommendations we made in this article was to set business goals to guide experiments. Advanced organizations may choose to break that rule and invest in pure R&D. This might include following along with the latest research papers, playing with a data hypothesis, or brainstorming and testing blue sky ideas. Outcomes here won't be assured, but can be transformative.
While there are many factors that contribute to operational maturity and success with GenAI, the optimistic news is that almost all of them can be adopted iteratively and incrementally with appropriate foresight. This can be comforting. Given competitive pressures, the one option that's almost always unacceptable is to remain in the status quo and wait to see how things play out.
If you're looking for help strategizing how to operationalize your GenAI initiatives, Caylent can help. Our Generative AI Flight Plan Caylent Catalyst can help you build an AI roadmap for your organization and demonstrate how generative AI can positively impact you. If you have a specific GenAI vision, we can also tailor an engagement exactly to your requirements. Get in touch with our team to explore opportunities to innovate with AI.
Dr. Khobaib Zaamout is the Principal Architect for AI Strategy at Caylent, where his main focus lies in AIML and Generative AI. He brings a solid background with over ten years of experience in software, Data, and AIML. Khobaib has earned a master's in Machine Learning and holds a doctorate in Data Science. His professional journey also involves extensive consulting, solutioning, and leadership roles. Based in Chestermere, Alberta, Canada, Khobaib enjoys a laid-back life. Outside of work, he likes cooking for his family and friends and finds relaxation in camping trips to the Rocky Mountains.
View Khobaib's articlesMark Olson, Caylent's Portfolio CTO, is passionate about helping clients transform and leverage AWS services to accelerate their objectives. He applies curiosity and a systems thinking mindset to find the optimal balance among technical and business requirements and constraints. His 20+ years of experience spans team leadership, technical sales, consulting, product development, cloud adoption, cloud native development, and enterprise-wide as well as line of business solution architecture and software development from Fortune 500s to startups. He recharges outdoors - you might find him and his wife climbing a rock, backpacking, hiking, or riding a bike up a road or down a mountain.
View Mark's articlesCaylent Catalysts™
Accelerate investment and mitigate risk when developing generative AI solutions.
Caylent Catalysts™
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Caylent Catalysts™
Plan and implement an MLOps strategy unique to your team's needs, capabilities, and current state, unlocking the next steps in tactical execution by offloading the infrastructure, data, operations, and automation work from data scientists.
Leveraging our accelerators and technical experience
Browse GenAI OfferingsLearn how Generative AI is poised to transform healthcare by addressing technological challenges, reducing administrative burdens, enhancing clinical decision-making, and creating more personalized, efficient patient care experiences.
Read about the experiences our summer technology fellow had at Caylent, where she explored cloud computing, generative AI, web development, and more.
The AI industry is growing rapidly and a variety of models now exist to tackle different use cases. Amazon Bedrock provides access to diverse AI models, seamless AWS integration, and robust security, making it a top choice for businesses who want to pursue innovation without vendor lock-in.