Caylent Catalysts™
Generative AI Strategy
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Learn about all of the AI announcements unveiled at AWS re:Invent 2025, including the Amazon Nova 2 family, updates to Amazon Bedrock AgentCore, Kiro autonomous agents, Amazon S3 Vectors, and more.
AWS re:Invent is always developer Christmas, and the 2025 edition did not disappoint. For AI specifically, it feels like a big payoff moment of AWS's AI strategy over the past years. The Amazon Nova models we first saw in 2024 got a complete generational upgrade, browser automation went from "interesting experiment" to "production-ready with 90%+ reliability," and Kiro, AWS's agentic IDE, got both significant attention throughout the event and at Werner's keynote, and a brand new frontier agent that actually remembers context across sessions. Watch our recap video here.
This article breaks down the announcements that matter for how you build AI applications on AWS. We're going to talk about the features that change your architecture decisions, your cost calculations, or what's suddenly possible that wasn't before. We'll cover the Amazon Nova 2 family, the agent infrastructure that makes production deployment a surmountable challenge, and the customization tools that will make ML teams very, very happy.
Amazon Nova launched in late 2024 as AWS's answer to the "where's your competitive model?" question. It had competitive pricing and decent performance, but time in AI is like dog years, and anything that launched in 2024 feels decades old. Amazon Nova 2 is a much-needed new wave, and they're good models!
The family now includes four specialized models: Lite for fast reasoning with cost control, Pro for highly complex, multistep tasks, Sonic for real-time voice conversations, and Omni for multimodal everything.
Amazon Nova 2 Sonic is a speech-to-speech foundation model, capable of handling voice input and generating voice output natively (not text-to-speech bolted onto an LLM). The result is natural turn-taking, where the model knows when you're done speaking and responds without awkward pauses or cutting you off.
Compared to earlier Nova Sonic coverage, Amazon Nova 2 Sonic demonstrates clear gains in capability.
Besides quality improvements, one of the things they added is polyglot voices: A single voice persona that can switch between English, French, Spanish, German, Italian, Portuguese, and Hindi within the same conversation. If you've built multilingual voice applications before and have suffered through managing separate voice models for each language, you know how valuable consolidation is.
The model also supports cross-modal interactions, allowing users to switch between text and voice input within the same session without losing context. Combined with asynchronous tool calling, which allows the model to execute multiple tools without pausing the conversation, you can build voice agents that feel genuinely responsive.
The prices are $0.003 per 1,000 input tokens and $0.012 per 1,000 output tokens for Speech, and $0.00033 per 1,000 input tokens and $0.00275 per 1,000 output tokens for Text.
Nova Sonic is now also supported as a model in Amazon Connect, a much-requested feature for the contact center space.
Amazon Nova 2 Lite is a fast, cost-effective reasoning model. And it's definitely very fast! Amazon Nova 2 Pro is a significantly smarter version, and for now, it is only available in preview. Both support extended thinking across three budget levels: low, medium, and high, giving you explicit control over the speed/intelligence/cost trade-off that's usually hidden within the model.
Both Lite and Pro have a 1 million-token context window, which, for Lite, is a new feature at this price point. This enables workflows that would otherwise require chunking and summarization. Combined with built-in web grounding and a code interpreter that runs and evaluates code within the same workflow, you can use Amazon 2 Lite and Amazon Nova 2 Pro to build research or analysis agents without needing to stitch together external tools.
The price for Amazon Nova 2 Lite is $0.00125 per 1,000 input tokens and $0.0025 per 1,000 output tokens. For Amazon Nova 2 Pro, the price is $0.0003 per 1,000 input tokens for text, image, video, or audio, and $0.01 per 1,000 output tokens for text (the only supported output modality).
Amazon Nova 2 Omni is the first reasoning model that accepts text, images, video, and speech as input while generating both text and image outputs. This means workflows that currently require multiple models (one for reasoning, another for image generation, a third for transcription) could collapse into single model calls. Omni is in preview, and it's always possible the quality doesn't fully match specialized models, but the architectural simplification is a real benefit.
Image generation includes character consistency and text rendering within images. Speech understanding handles multi-speaker transcription and translation natively. The 1M context window and 200+ language support carry over from the Amazon Nova family.
The price is $0.0003 per 1,000 input tokens for text, image, or video, or $0.001 per 1,000 input tokens for audio. For output, it's $0.0025 per 1,000 output tokens of text and $0.04 per 1,000 output tokens of image.
If you’ve used ChatGPT, Claude.ai, or Perplexity, you’re likely familiar with their ability to search the web and incorporate results directly into responses—another form of retrieval-augmented generation. Have you ever wanted to build that capability into one of your GenAI applications? Well, now you can do it very easily with Amazon Nova Web Grounding.
It's not always-on retrieval; the model determines when external information would improve the response, and performs the retrieval. Results include reasoning traces showing where the model queried external sources, plus content objects with citations you can surface to users.
This feature is very important for knowledge-based assistants, content generation requiring fact-checking, research synthesis, and any application where accuracy and source attribution are important. You get grounded responses with citations without needing to build a retrieval infrastructure.
Amazon Nova Web Grounding currently works with Amazon Nova Premier, Amazon Nova 2 Pro, and Amazon Nova 2 Omni, with support for other Amazon Nova models coming. The price is $30.00 per 1K requests.
Building agents for POCs is straightforward nowadays. Deploying agents to production, where they interact with real systems, real data, and real users, is another story entirely. These announcements address that gap directly.
Amazon Nova Act was announced in preview back in April 2025 as a way for the LLM to control the browser through small steps described in natural language. Now it has become generally available, with a claim of 90%+ task reliability at scale. That number represents the hope that browser automation becomes operationally viable, potentially replacing tools like Selenium and Playwright.
Behind the scenes, Amazon Nova Act uses a custom Amazon Nova 2 Lite model trained specifically for browser driving, combined with purpose-built orchestration, tools, and SDK. Developers can start prototyping in Amazon Nova Act Playground at nova.amazon.com/act without writing code. When you're ready to deploy, the system packages your workflow into a Docker container, pushes it to Amazon ECR, creates IAM roles and S3 buckets, and automatically deploys it to AgentCore Runtime. Zero infrastructure configuration is required.
The task definition is natural language, but the execution is deterministic browser actions with state management. Live browser views show exactly what the agent is doing, and execution logs reveal the reasoning behind each action.
Amazon Bedrock AgentCore's updates make governance and observability easier when deploying AI agents to production. These are the four capabilities that matter the most.
Policy (in preview) intercepts Amazon Bedrock AgentCore Gateway tool calls before they execute and enforces boundaries defined in Cedar (AWS's open-source policy language). Natural language policy authoring is available: describe what you want in plain English, and the system generates and validates the Cedar policy.
Here's an example of a policy about giving refunds to customers:
permit (
principal in Role::"refund-agent",
action == Action::"processRefund",
resource
)
when {
resource.amount < 200.00 &&
resource.currency == "USD"
};
This agent can process refunds only for amounts under $200 USD and only when the requester has the refund-agent role. With a policy like this, even if your prompting fails or the model hallucinates an action, the policy layer prevents unauthorized execution. That's defense in depth for agent systems.
Evaluations (in preview) provide continuous quality scoring of an agent's performance. Built-in evaluators assess correctness, faithfulness, helpfulness, tool-selection accuracy, and goal-success rate. The system samples live agent interactions and scores them continuously, showing results in Amazon CloudWatch alongside Amazon Bedrock AgentCore Observability insights, and allowing you to set alerts.
Episodic Memory enables agents to learn from past experiences. Structured episodes capture context, reasoning, actions, and outcomes, and a reflection agent then extracts patterns, enabling it to retrieve relevant learnings when facing similar tasks.
Imagine a travel booking agent that learns a particular user frequently reschedules flights for client meetings. After a few interactions, the agent proactively suggests flexible return options for future work trips without being explicitly programmed to do so. The learning emerges from experience, not from you anticipating every user pattern and encoding it in prompts.
Bidirectional Streaming enables voice agents to listen while speaking and adapt to interruptions. Users can cut in mid-response, and the agent adjusts without losing context. This creates the conversational flow users expect in human interactions.
Kiro is AWS's agentic IDE. It launched in preview in July, went into general availability on November 17th, and has since received significant attention, including being featured at the Road to re:Invent hackathon (which 3 Caylent employees participated in). Now, AWS has added to it an autonomous agent (in preview): a system that maintains context across sessions, learns from your feedback, and executes tasks asynchronously while you work on other things.
The persistent context is the key differentiator. Instead of starting fresh on each conversation, Kiro's autonomous agent maintains awareness across all your work and never forgets. Multi-repo awareness extends this further, letting you treat changes across multiple repositories as a unified task. Moreover, when you leave PR feedback like "always use our standard error handling pattern," it remembers that and automatically applies it to all future work.
Asynchronous execution means the agent runs up to 10 tasks concurrently while you focus elsewhere. It spins up isolated sandbox environments, clones repositories, analyzes codebases, breaks down work into tasks, defines acceptance criteria, asks clarifying questions when uncertain, and opens pull requests with detailed explanations.
Preview is rolling out to Kiro Pro, Pro+, and Power subscribers with no additional cost during the preview period (subject to weekly limits). You can add a kiro label to any GitHub issue or mention /kiro in a comment to assign work to it.
Other notable features from the general availability launch include property-based testing that extracts requirements from specifications and tests with hundreds of random cases, checkpointing to roll back any number of steps without losing progress, a full CLI with agent capabilities, and enterprise management through IAM Identity Center.
Reinforcement Fine-Tuning is a new feature in Amazon Bedrock that delivers a stated average gain of 66% accuracy over base models without requiring large labeled datasets. The approach uses feedback-driven learning: models improve iteratively based on reward signals rather than example pairs.
Two methods are available:
You can use stored invocation logs directly (your existing Amazon Bedrock usage becomes training data) or upload new datasets from Amazon S3. Reward functions can be implemented as custom Lambda code or as model-as-judge configurations. The training dashboard shows reward scores, loss curves, and accuracy improvements in real-time.
Reinforcement Fine-Tuning currently supports Amazon Nova 2 Lite, with additional models coming. After training completes you can deploy the new model to Amazon SageMaker with a single click. Tip: Be thoughtful about reward function design, the model optimizes for whatever you measure, and poorly designed rewards create poorly behaved models.
Amazon Nova Forge lets you build custom frontier models from early Amazon Nova checkpoints (pre-training, mid-training, or post-training phases). The key innovation is data mixing: you can blend your proprietary data with the training data used by Amazon Nova. This is expected to reduce the loss of general capabilities that occurs with fine-tuning, while still allowing you to add your specialized knowledge.
Amazon Nova Forge also supports custom reward functions in your own environments. Chemistry simulation tools, robotics environments, proprietary evaluation frameworks, you define the rewards, and the model optimizes for them. The main use cases are for manufacturing, R&D with proprietary research data, media companies with brand-specific requirements, and industries with specialized terminology or workflows.
The main limitations are the $ 100,000-per-year price tag just to use Amazon Nova Forge and the fact that it's only available on Amazon Nova models.
Several infrastructure announcements are interesting if you're deploying AI agents at scale.
Amazon S3 Vectors moved from preview to general availability with massive improvements. Vector capacity increased from 50 million to 2 billion vectors per index (a 40x jump), with total capacity reaching 20 trillion vectors per bucket. Additionally, query latencies dropped to approximately 100 ms for frequent queries and remained under one second for infrequent ones.
As a reminder, Amazon S3 Vectors is by far the most cost-effective option for infrequently-accessed vectors, with a break-even point of 16 queries per second based on internal calculations. For straightforward RAG pipelines, S3 Vectors integrates natively with Amazon Bedrock Knowledge Bases and Amazon OpenSearch.
Amazon S3 Vectors is now available in 14 AWS Regions, expanded from 5 during preview.
Amazon SageMaker HyperPod added checkpointless and elastic training, both of which address pain points at scale.
Checkpointless training eliminates the checkpoint-restart cycles that slow recovery from failures. The traditional approach is to save the state periodically and restart from the last checkpoint when something fails. If you don't save the state often enough, a failure means the loss of hours of computation and several hours of recovery. The checkpointless approach gives you peer-to-peer state recovery, reducing downtime by 80%+ compared to checkpoint-based recovery (based on AWS internal studies with 16 to 2,000+ GPUs). AWS used this to train Amazon Nova models on tens of thousands of accelerators, and they report requiring zero manual intervention even at that scale.
Elastic training automatically scales up to use available accelerators and contracts when resources are needed elsewhere. Rather than terminating a job, it adds or removes data to parallel replicas while preserving the global batch size and adjusting learning rates to maintain model convergence.
Amazon SageMaker Serverless MLflow is another new feature that lets you create tracking instances in approximately 2 minutes with no server management. MLflow 3.4 support includes new tracing capabilities for generative AI development, such as capturing execution paths, inputs, outputs, and metadata. Cross-domain and cross-account access managed through AWS RAM enables team sharing without requiring complex permission management.
Don't confuse it with Amazon SageMaker Serverless Customization, which lets you do UI-driven fine-tuning for Amazon Nova, DeepSeek, GPT-OSS, Llama, and Qwen models. Fine-tuning techniques include SFT, DPO, RLVR, and RLAIF, and they're all accessible with a few clicks. The system automatically provisions the appropriate compute based on the model and data size. And once you have your model, you can deploy it to either Amazon Bedrock or Amazon SageMaker endpoints with a single click.
Rapid fire on the remaining announcements worth tracking:
Amazon Nova Multimodal Embeddings is a unified embeddings model for text, documents, images, video, and audio. You can query by text to find images, or search for videos by description. It comes with four dimension options (3,072 down to 256) via Matryoshka Representation Learning. Integrates with Amazon OpenSearch and Amazon S3 Vectors, and is available in US East (N. Virginia).
Amazon Bedrock service tiers is a new invocation feature that gives you per-request control over the cost-performance tradeoff:
You can choose the tier per API call via the service_tier parameter, allowing you to mix tiers within the same application based on request criticality. Quick decision framework: if a user is waiting for the response, use Priority. If it's a background job, use Flex. Everything else, use Standard. The supported models for now are those from OpenAI, Qwen, and Amazon Nova.
18 New Open Weight Models added to Amazon Bedrock, bringing the total to nearly 100 serverless models. Notable additions include Mistral Large 3 and the Ministral family, Google Gemma 3 family for edge deployment, and Moonshot Kimi K2 for deep reasoning with tool use.
Amazon Bedrock Guardrails for Code extended content filters to handle code comments, variable names, string literals, and to detect prompt leakage in system prompts.
Amazon Bedrock Data Automation (BDA) added 10 languages for speech analytics beyond English: Portuguese, French, Italian, Spanish, German, Chinese, Cantonese, Taiwanese, Korean, and Japanese. BDA automatically detects the language and creates multi-lingual transcripts.
AWS Security Hub provides a native control plane for cloud security operations on AWS. It ingests findings from native services and partners, correlates them, and produces exposure findings that help you identify real, exploitable risk.
AWS is systematically addressing the barriers that kept AI systems experimental rather than production-ready. Amazon Nova 2 brings AWS back to the model race, Amazon Bedrock AgentCore provides the governance and observability enterprises require for production-grade AI, Amazon Nova Act proves browser automation can hit reliability thresholds that matter, and Kiro demonstrates what persistent context enables for development workflows.
This year’s AWS re:Invent delivered some of the most exciting AI and cloud innovations we’ve seen yet, and we can’t wait to start leveraging these new advancements to help our customers move with even more speed, clarity, and confidence. From breakthrough model upgrades to long-awaited advancements, the momentum is undeniable. Get in touch with our AWS experts to discover how we can help your organization take advantage of these exciting advancements!
Guille Ojeda is a Senior Innovation Architect at Caylent, a speaker, author, and content creator. He has published 2 books, over 200 blog articles, and writes a free newsletter called Simple AWS with more than 45,000 subscribers. He's spoken at multiple AWS Summits and other events, and was recognized as AWS Builder of the Year in 2025.
View Guille's articlesCaylent Catalysts™
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Caylent Services
Apply artificial intelligence (AI) to your data to automate business processes and predict outcomes. Gain a competitive edge in your industry and make more informed decisions.
Leveraging our accelerators and technical experience
Browse GenAI OfferingsExplore our technical analysis of AWS re:Invent 2024 price reductions and performance improvements across DynamoDB, Aurora, Bedrock, FSx, Trainium2, SageMaker AI, and Nova models, along with architecture details and implementation impact.
Get up to speed on all the GenAI, AI, and ML focused 300 and 400 level sessions from re:Invent 2023!
Get up to speed on all the storage focused 300 and 400 level sessions from re:Invent 2023!