Caylent Launches Dedicated Anthropic Practice

Claude Opus 4.8: What Improved, What’s New, and What It Means for Enterprise

Generative AI & LLMOps

Explore Claude Opus 4.8, Anthropic's most capable generally available model to date, with improvements around long-running agents, coding, enterprise workflows, financial analysis, cybersecurity, and multimodal reasoning.

On May 28th, Anthropic launched Claude Opus 4.8, their most capable generally available model to date. They claim a “modest but tangible improvement” over Opus 4.7, with improvements around long-running agents, coding, enterprise workflows, financial analysis, cybersecurity, and multimodal reasoning. 

In this blog, we'll dive into those claims and explore Opus 4.8's new features.

Opus 4.8 Characteristics and New Features

Opus 4.8 comes with a 1M context and is available by default on the Claude API and Amazon Bedrock, at the same price as Opus 4.7 ($5 per million input tokens and $25 per million output tokens). It also supports a maximum output of 128 K tokens, the same as Opus 4.7. Adaptive Thinking remains in place, meaning manual extended-thinking budgets are not supported. Anthropic explicitly warns that Opus 4.8 may use more tokens than Opus 4.7 on some tasks, especially where effort controls and multi-turn behavior affect total usage.

Opus 4.8 defaults to high effort, which should spend a similar number of tokens as Opus 4.7’s default, but with better performance. You can also choose “extra” (“xhigh” in Claude Code) or “max”, and the model will spend more tokens to get better results. Expecting overall higher token usage, Anthropic increased the rate limits for Claude Code.

A new feature of Opus 4.8 is that it accepts system messages inside the conversation history after a user turn. This means agents can update instructions during a long-running task without restating the full system prompt, preserving prompt cache hits on earlier turns and reducing input cost on agentic loops. This is useful for workflows that require updates to permissions, task budgets, environment state, or operating instructions after the run has already started.

Another new feature is Fast mode, which delivers up to 2.5x higher output tokens per second at $10 / $50 per million tokens for Opus 4.8 (2x the price of regular Opus 4.8 requests) and $30 / $150 per million tokens (6x the price of regular requests for Opus 4.7 and 4.6). It's in research preview, only available for Opus models for now (4.8 and the older 4.7 and 4.6), and only through the Claude API.

Something new in Claude Code, though not necessarily related to Opus 4.8, is dynamic workflows. This new feature allows Claude to plan the work and then spawn hundreds of parallel subagents in a single session. It's intended for large-scale work such as entire codebase migrations, and will consume substantially more tokens than a typical Claude Code session. It's only available for the Claude Enterprise, Teams, and Max plans.

Anthropic’s migration guide says Opus 4.8 should perform strongly on existing Opus 4.7 prompts and evals, with no breaking API changes for code already running on Opus 4.7. The same guide calls for re-baselining effort, latency, and cost after changing the model ID.

Where Opus 4.8 Improves On Opus 4.7

There are two main areas where Opus 4.8 represents an improvement: long-horizon agentic coding with better long-context handling and compaction recovery, and more reliable reasoning across long tasks.

Long-Running Agents

One of the central improvements in Opus 4.8 is sustained execution across longer work. Opus 4.8 can work independently for longer, break down ambiguous problems, self-correct, track dependencies, route around blockers, and complete extended tasks with fewer check-ins.

That behavior affects workflows where the cost of failure comes from accumulation. A model can produce a plausible intermediate answer and still damage the final result by skipping a tool call, forgetting a constraint, misreading a dependency, or failing to flag uncertainty. Long-running agents and coding workflows are especially exposed because each step changes the context for the next step.

Opus 4.8 is a significant improvement when a workflow is sufficiently long and has enough state for small failures to accumulate. We're talking repo-scale refactors, long Claude Code sessions, multi-document analysis, financial filing reviews, or agent workflows. It's not that the model is better at general reasoning; rather, the improvement there isn't particularly significant compared to Opus 4.7. The key here is that Opus 4.8 is much better at staying coherent over many turns.

Reliability and Uncertainty

Early testers found Opus 4.8 more likely to flag uncertainty and less likely to make unsupported claims, and Anthropic’s evaluations found the model about four times less likely than its predecessor to leave flaws in its own code unremarked.

For engineering teams, a model that creates fewer hidden defects, uses tools more reliably, and gives reviewers a clearer view of what it is confident about is a significant improvement. For business roles, the same behavior can reduce review burden in dense professional workflows where unsupported claims, missed inputs, or poor source handling create downstream risk.

What Claude Opus 4.8 means for enterprise AI teams

Opus 4.8 is strongest where task complexity allows the model to demonstrate its improvements. Good candidates include long-running agentic coding, large-codebase analysis, multi-document professional work, dense financial- or legal-style analysis, and tasks where missed uncertainty or weak self-checking creates review risk.

There is a tendency in the industry to consider every problem a hard problem requiring the most powerful model on the highest settings. For personal accounts, that's not really a problem so long as you're prepared to hit the usage limits often. Enterprise usage needs another strategy.

Our recommendation is to move all workloads and workflows currently using Opus 4.7 to Opus 4.8. Our tests back Anthropic's claim of similar or improved performance for the majority of use cases without requiring prompt changes. Once on Opus 4.8, we recommend running evals and adjusting effort and prompts as needed, optimizing for either improved quality or the same quality at lower cost, as appropriate.

For workloads currently not on Opus, the improvement that Opus 4.8 delivers over Opus 4.7 will hardly be worth the 3x price increase over Sonnet. It's always recommended to test things for yourself, since specific use cases see varying degrees of performance. However, we don't expect Opus 4.8 to replace Sonnet 4.6 due to unit economics.

As for the new features, we recommend trying dynamic workloads. The costs are significantly higher, but we've observed good results in our tests. It's possible that your own harnesses and workflows already deliver similar results, or that the improvement isn't worth the extra cost for some particular use cases. A similar analysis can be made for “extra” (“xhigh” in Claude Code) and “max” efforts: The improved results will not always be worth the extra cost. In both cases, it's important to compare the cost per final result, and not the cost per prompt; a single prompt that costs twice as much is more cost-effective than 3 cheaper prompts. For both features, our recommendation is to test and evaluate them on a case-by-case basis.

How Caylent Can Help

As organizations evaluate Claude Opus 4.8, Caylent can help identify where the model’s improved long-running agent performance, coding capabilities, and reasoning reliability can deliver meaningful business value. From model selection and cost optimization to agentic workflow design, evaluation frameworks, and production deployment on AWS, our team helps enterprises move from experimentation to scalable, secure AI solutions.

As a charter member of Anthropic’s Claude Partner Network and an AWS Premier Tier Services Partner, Caylent brings deep expertise across Claude, Amazon Bedrock, and enterprise-grade AI implementation. Whether you’re migrating from Opus 4.7, testing new agentic workflows, or determining where Opus 4.8 fits within your broader AI strategy, Caylent can help you build, validate, and optimize solutions that deliver measurable impact. Reach out to us today to learn more. 

Generative AI & LLMOps
Guille Ojeda

Guille Ojeda

Guille Ojeda is a Principal Innovation Architect at Caylent, a speaker, author, and content creator. He has published 2 books, over 200 blog articles, and writes a free newsletter called Simple AWS with more than 45,000 subscribers. He's spoken at multiple AWS Summits and other events, and was recognized as AWS Builder of the Year in 2025.

View Guille's articles

Learn more about the services mentioned

Caylent Catalysts™

Generative AI Strategy

Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.

Caylent Catalysts™

AWS Generative AI Proof of Value

Accelerate investment and mitigate risk when developing generative AI solutions.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Related Blog Posts

Claude Platform on AWS: An Architecture Decision Guide for AWS Teams

A decision guide for AWS teams on choosing between Claude Platform on AWS, Amazon Bedrock, and Claude Enterprise, with migration considerations for existing Bedrock users.

Generative AI & LLMOps

From Prompt Edits to Performance Loops: Hands-On with Amazon Bedrock AgentCore Optimization

Amazon Bedrock AgentCore now gives teams a native way to generate, validate, and test changes to agent behavior using traces, evaluations, configuration versions, and gateway-based A/B experiments. Caylent evaluated the feature through private-beta access. This article presents the results of those evaluations and what they mean for teams building on Bedrock.

Generative AI & LLMOps

Claude Opus 4.7 Deep Dive: Capabilities, Migration, and the New Economics of Long-Running Agents

Explore Claude Opus 4.7, Anthropic’s most capable generally available model, with stronger agentic coding, high-resolution vision, 1M context, and a migration story that matters almost as much as the benchmark scores.

Generative AI & LLMOps