Caylent Catalysts™
Generative AI Strategy
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Explore the newly launched Claude Opus 4.6, Anthropic's most intelligent model to date.
Anthropic just shipped Claude Opus 4.6, and while it sounds like just a minor version bump, it brings with it a lot more than new scores in benchmarks. This is currently the most capable AI model in the world, and it's available on Amazon Bedrock today.
Claude Opus 4.6 is Anthropic’s most intelligent model to date, with improved benchmark scores across coding, autonomous agents, and long-context reasoning. Opus 4.6 now tops the Artificial Analysis Intelligence Index, leading across agent tasks, terminal coding, and research benchmarks. On legal reasoning, it scored 90.2% on BigLaw Bench, the highest of any Claude model. And on economically valuable knowledge work tasks, it outperforms the previous leader GPT-5.2 by roughly 144 Elo points.
But the more meaningful upgrades go beyond scores. The model now supports a 1M token context window (in preview), which means it can ingest and reason over significantly higher portions of codebases, regulatory filings, or contract libraries. Anthropic had previously experimented with a 1M token context window with Sonnet 4.5, and has now paired that significant advantage with the much higher capabilities of its flagship line of models, Opus. Moreover, a new adaptive thinking system lets you dial reasoning effort across four levels, from quick lookups to deep analysis, giving teams real control over the speed-cost-intelligence tradeoff. And with a limit of 128K output tokens per response, Opus 4.6 can generate entire documents, not just fragments, in a single output.
For developers building on Amazon Bedrock, there’s also a new compaction API (in public beta) that lets you compact the context in a conversation, allowing you to build long-running agents that maintain coherence across conversations too long to fit within a single context window.
We’ve been hands-on with Claude Opus 4.6 since it landed on Amazon Bedrock, and a few things stand out.
First, the long-context reliability is a step change. Previous models would lose coherence or miss details as conversations stretched past a few hundred thousand tokens. Opus 4.6 scores 76% on the 8-needle MRCR benchmark at 1M tokens, compared to 18.5% for Sonnet 4.5. In practical terms, that means you can trust the model to track details across massive inputs that would trigger a “context rot” in most other models. Performance degradation still occurs as the size of the context reaches a significant portion of the context window, with 50% being the typical number where most models exhibit noticeable degradation. However, a context window of 1M tokens means that 50% is now 500k tokens, instead of just 128k tokens as for most other models.
Second, Opus 4.6 displays significantly improved performance when managing complex workflows across dozens of tools. When used with Claude Code, we have observed it spin up sub-agents proactively, and handle the full development lifecycle: from requirements to implementation to maintenance. Human oversight is still required, but Opus 4.6 displays significantly higher autonomy when managing issues across multiple repositories.
Third, the adaptive thinking feature is more useful than it sounds on paper, even on the API. The ability to set reasoning effort has been available for a while, but typical implementations would consider the overall type of requests and set a fixed value for budget_tokens when using extended thinking. In theory you could modify the value of budget_tokens for each request, but in practice that would require analyzing the request, possibly with a pre-processing LLM. We have implemented this architecture in some cases, and it can work very well, but it incurs additional latency and costs. Adaptive thinking does this automatically, with no extra latency, at no additional cost. Moreover, adaptive thinking reliably drives better performance than extended thinking with a fixed budget_tokens, and we recommend moving to adaptive thinking to get the most intelligent responses from Claude Opus 4.6.
Opus 4.6 on Amazon Bedrock sets the stage for State-of-the-Art models to not just improve on intelligence, but also deliver larger context windows and useful features.
With Amazon Bedrock’s enterprise-grade security controls, data residency guarantees, and governance tooling, combined with a model that can genuinely handle complex, multi-step professional work, the path to a return on investment from production deployments of generative AI has fewer obstacles.
For organizations building AI-powered workflows in financial analysis, legal review, software development, or cybersecurity, Opus 4.6 isn’t just faster or smarter. It’s the first model that can be a reliable participant in your actual business processes, not just an assistant on the side.
The question for enterprise teams is no longer whether to deploy frontier AI. It’s how quickly you can architect the right foundation to capture the value.
That’s where we come in. As the AWS GenAI Partner of the Year and our deep experience in AI implementation, Caylent helps organizations move from experimentation to production-grade AI on AWS. Whether you’re designing your first agentic workflow or scaling an existing AI platform, our AI services team can help you get there.
Guille Ojeda is a Senior Innovation Architect at Caylent, a speaker, author, and content creator. He has published 2 books, over 200 blog articles, and writes a free newsletter called Simple AWS with more than 45,000 subscribers. He's spoken at multiple AWS Summits and other events, and was recognized as AWS Builder of the Year in 2025.
View Guille's articlesCaylent Catalysts™
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Caylent Catalysts™
Accelerate investment and mitigate risk when developing generative AI solutions.
Leveraging our accelerators and technical experience
Browse GenAI OfferingsAI agents represent the next evolution of APIs, but they also bring new security challenges and attack vectors. Examine real-world adversarial threats and learn defensive strategies in this blog.
Discover hard-earned lessons we've learned from over 200 enterprise GenAI deployments and what it really takes to move from POC to production at scale.
Explore hard-earned lessons we've learned from 200+ enterprise GenAI deployments.