This agent can process refunds only for amounts under $200 USD and only when the requester has the refund-agent role. With a policy like this, even if your prompting fails or the model hallucinates an action, the policy layer prevents unauthorized execution. That's defense in depth for agent systems.
Evaluations (in preview) provide continuous quality scoring of an agent's performance. Built-in evaluators assess correctness, faithfulness, helpfulness, tool-selection accuracy, and goal-success rate. The system samples live agent interactions and scores them continuously, showing results in Amazon CloudWatch alongside Amazon Bedrock AgentCore Observability insights, and allowing you to set alerts.
Episodic Memory enables agents to learn from past experiences. Structured episodes capture context, reasoning, actions, and outcomes, and a reflection agent then extracts patterns, enabling it to retrieve relevant learnings when facing similar tasks.
Imagine a travel booking agent that learns a particular user frequently reschedules flights for client meetings. After a few interactions, the agent proactively suggests flexible return options for future work trips without being explicitly programmed to do so. The learning emerges from experience, not from you anticipating every user pattern and encoding it in prompts.
Bidirectional Streaming enables voice agents to listen while speaking and adapt to interruptions. Users can cut in mid-response, and the agent adjusts without losing context. This creates the conversational flow users expect in human interactions.
Kiro Autonomous Agent
Kiro is AWS's agentic IDE. It launched in preview in July, went into general availability on November 17th, and has since received significant attention, including being featured at the Road to re:Invent hackathon (which 3 Caylent employees participated in). Now, AWS has added to it an autonomous agent (in preview): a system that maintains context across sessions, learns from your feedback, and executes tasks asynchronously while you work on other things.
The persistent context is the key differentiator. Instead of starting fresh on each conversation, Kiro's autonomous agent maintains awareness across all your work and never forgets. Multi-repo awareness extends this further, letting you treat changes across multiple repositories as a unified task. Moreover, when you leave PR feedback like "always use our standard error handling pattern," it remembers that and automatically applies it to all future work.
Asynchronous execution means the agent runs up to 10 tasks concurrently while you focus elsewhere. It spins up isolated sandbox environments, clones repositories, analyzes codebases, breaks down work into tasks, defines acceptance criteria, asks clarifying questions when uncertain, and opens pull requests with detailed explanations.
Preview is rolling out to Kiro Pro, Pro+, and Power subscribers with no additional cost during the preview period (subject to weekly limits). You can add a kiro label to any GitHub issue or mention /kiro in a comment to assign work to it.
Other notable features from the general availability launch include property-based testing that extracts requirements from specifications and tests with hundreds of random cases, checkpointing to roll back any number of steps without losing progress, a full CLI with agent capabilities, and enterprise management through IAM Identity Center.
Model Customization: Fine-Tuning and Amazon Nova Forge
Reinforcement Fine-Tuning
Reinforcement Fine-Tuning is a new feature in Amazon Bedrock that delivers a stated average gain of 66% accuracy over base models without requiring large labeled datasets. The approach uses feedback-driven learning: models improve iteratively based on reward signals rather than example pairs.
Two methods are available:
- Reinforcement Learning with Verifiable Rewards (RLVR): Uses rule-based graders for objective tasks such as code generation (does it compile?), mathematical reasoning (is the answer correct?), and any task with a verifiable answer.
- Reinforcement Learning from AI Feedback (RLAIF): Uses AI-based judges for subjective tasks such as instruction following, content moderation, and tone consistency, where correctness is harder to define programmatically.
You can use stored invocation logs directly (your existing Amazon Bedrock usage becomes training data) or upload new datasets from Amazon S3. Reward functions can be implemented as custom Lambda code or as model-as-judge configurations. The training dashboard shows reward scores, loss curves, and accuracy improvements in real-time.
Reinforcement Fine-Tuning currently supports Amazon Nova 2 Lite, with additional models coming. After training completes you can deploy the new model to Amazon SageMaker with a single click. Tip: Be thoughtful about reward function design, the model optimizes for whatever you measure, and poorly designed rewards create poorly behaved models.
Amazon Nova Forge
Amazon Nova Forge lets you build custom frontier models from early Amazon Nova checkpoints (pre-training, mid-training, or post-training phases). The key innovation is data mixing: you can blend your proprietary data with the training data used by Amazon Nova. This is expected to reduce the loss of general capabilities that occurs with fine-tuning, while still allowing you to add your specialized knowledge.
Amazon Nova Forge also supports custom reward functions in your own environments. Chemistry simulation tools, robotics environments, proprietary evaluation frameworks, you define the rewards, and the model optimizes for them. The main use cases are for manufacturing, R&D with proprietary research data, media companies with brand-specific requirements, and industries with specialized terminology or workflows.
The main limitations are the $ 100,000-per-year price tag just to use Amazon Nova Forge and the fact that it's only available on Amazon Nova models.
Amazon S3 Vectors and Amazon SageMaker Updates
Several infrastructure announcements are interesting if you're deploying AI agents at scale.
Amazon S3 Vectors moved from preview to general availability with massive improvements. Vector capacity increased from 50 million to 2 billion vectors per index (a 40x jump), with total capacity reaching 20 trillion vectors per bucket. Additionally, query latencies dropped to approximately 100 ms for frequent queries and remained under one second for infrequent ones.
As a reminder, Amazon S3 Vectors is by far the most cost-effective option for infrequently-accessed vectors, with a break-even point of 16 queries per second based on internal calculations. For straightforward RAG pipelines, S3 Vectors integrates natively with Amazon Bedrock Knowledge Bases and Amazon OpenSearch.
Amazon S3 Vectors is now available in 14 AWS Regions, expanded from 5 during preview.
Amazon SageMaker HyperPod added checkpointless and elastic training, both of which address pain points at scale.
Checkpointless training eliminates the checkpoint-restart cycles that slow recovery from failures. The traditional approach is to save the state periodically and restart from the last checkpoint when something fails. If you don't save the state often enough, a failure means the loss of hours of computation and several hours of recovery. The checkpointless approach gives you peer-to-peer state recovery, reducing downtime by 80%+ compared to checkpoint-based recovery (based on AWS internal studies with 16 to 2,000+ GPUs). AWS used this to train Amazon Nova models on tens of thousands of accelerators, and they report requiring zero manual intervention even at that scale.
Elastic training automatically scales up to use available accelerators and contracts when resources are needed elsewhere. Rather than terminating a job, it adds or removes data to parallel replicas while preserving the global batch size and adjusting learning rates to maintain model convergence.
Amazon SageMaker Serverless MLflow is another new feature that lets you create tracking instances in approximately 2 minutes with no server management. MLflow 3.4 support includes new tracing capabilities for generative AI development, such as capturing execution paths, inputs, outputs, and metadata. Cross-domain and cross-account access managed through AWS RAM enables team sharing without requiring complex permission management.
Don't confuse it with Amazon SageMaker Serverless Customization, which lets you do UI-driven fine-tuning for Amazon Nova, DeepSeek, GPT-OSS, Llama, and Qwen models. Fine-tuning techniques include SFT, DPO, RLVR, and RLAIF, and they're all accessible with a few clicks. The system automatically provisions the appropriate compute based on the model and data size. And once you have your model, you can deploy it to either Amazon Bedrock or Amazon SageMaker endpoints with a single click.
Everything Else Worth Knowing
Rapid fire on the remaining announcements worth tracking:
Amazon Nova Multimodal Embeddings is a unified embeddings model for text, documents, images, video, and audio. You can query by text to find images, or search for videos by description. It comes with four dimension options (3,072 down to 256) via Matryoshka Representation Learning. Integrates with Amazon OpenSearch and Amazon S3 Vectors, and is available in US East (N. Virginia).
Amazon Bedrock service tiers is a new invocation feature that gives you per-request control over the cost-performance tradeoff:
- Priority: Up to 25% better latency for mission-critical applications, with a 75% price increase.
- Standard: Same as a regular Bedrock invocation.
- Flex: 50% lower pricing for latency-tolerant workloads like model evaluations or batch processing.
You can choose the tier per API call via the service_tier parameter, allowing you to mix tiers within the same application based on request criticality. Quick decision framework: if a user is waiting for the response, use Priority. If it's a background job, use Flex. Everything else, use Standard. The supported models for now are those from OpenAI, Qwen, and Amazon Nova.
18 New Open Weight Models added to Amazon Bedrock, bringing the total to nearly 100 serverless models. Notable additions include Mistral Large 3 and the Ministral family, Google Gemma 3 family for edge deployment, and Moonshot Kimi K2 for deep reasoning with tool use.
Amazon Bedrock Guardrails for Code extended content filters to handle code comments, variable names, string literals, and to detect prompt leakage in system prompts.
Amazon Bedrock Data Automation (BDA) added 10 languages for speech analytics beyond English: Portuguese, French, Italian, Spanish, German, Chinese, Cantonese, Taiwanese, Korean, and Japanese. BDA automatically detects the language and creates multi-lingual transcripts.
AWS Security Hub provides a native control plane for cloud security operations on AWS. It ingests findings from native services and partners, correlates them, and produces exposure findings that help you identify real, exploitable risk.
Conclusion
AWS is systematically addressing the barriers that kept AI systems experimental rather than production-ready. Amazon Nova 2 brings AWS back to the model race, Amazon Bedrock AgentCore provides the governance and observability enterprises require for production-grade AI, Amazon Nova Act proves browser automation can hit reliability thresholds that matter, and Kiro demonstrates what persistent context enables for development workflows.
This year’s AWS re:Invent delivered some of the most exciting AI and cloud innovations we’ve seen yet, and we can’t wait to start leveraging these new advancements to help our customers move with even more speed, clarity, and confidence. From breakthrough model upgrades to long-awaited advancements, the momentum is undeniable. Get in touch with our AWS experts to discover how we can help your organization take advantage of these exciting advancements!