Caylent Catalysts™
AWS Generative AI Proof of Value
Accelerate investment and mitigate risk when developing generative AI solutions.
Learn about all of the exciting and innovative announcements unveiled during Peter DeSantis and Dave Brown's AWS re:Invent 2025 keynote.
The infrastructure keynote from Peter DeSantis and Dave Brown offered a welcome recalibration. While agentic AI continues to dominate headlines, this session focused on the less glamorous, but absolutely decisive, layer that determines whether those systems actually work at scale.
The message was clear throughout the talk: as AI systems become more autonomous and more demanding, the underlying infrastructure carries more responsibility, not less.
Agentic systems increase pressure on every part of the stack. They generate unpredictable traffic patterns, spike concurrency, and place sustained demands on compute, memory, storage, and networking. These behaviors amplify architectural weaknesses that might have gone unnoticed in traditional applications.
That’s why the long-standing AWS focus on elasticity, resilience, and operational discipline shows up so strongly in AI conversations. When workloads become more dynamic, the platform’s ability to scale smoothly, recover quickly, and stay cost-efficient under load becomes a core enabler of innovation rather than a background concern.
In practice, teams building AI systems quickly discover that infrastructure decisions shape everything from latency and reliability to experimentation speed and operating margin.
The keynote’s deep dive into AWS Graviton and AWS Trainium highlighted how hardware design is increasingly aligned with modern workload characteristics.
This alignment matters because AI workloads are sensitive to efficiency at scale. Small improvements in performance-per-dollar compound rapidly when models train continuously, inference runs constantly, or agents orchestrate hundreds of parallel actions. By designing silicon specifically for these patterns, AWS gives builders more predictable economics and tighter control over performance.
For organizations operating at scale, compute is no longer a neutral choice. It is an architectural lever that directly affects feasibility and sustainability.
Dave Brown’s discussion of AWS Lambda evolution reflected how usage patterns have matured. AWS Lambda Managed Instances introduce a model where teams specify capacity while AWS continues to handle server operations.
This approach addresses a growing class of workloads that benefit from consistent baseline capacity, such as low-latency services or AI-driven systems with steady demand, while still avoiding the operational overhead of server management. It reflects a broader shift toward defining serverless around operational simplicity and velocity rather than a single scaling behavior.
The ongoing conversation around what “serverless” means is a healthy one. As workloads evolve, the abstraction evolves with them, shaped by practical requirements rather than ideology. It must be said that this point of view is not without its detractors. The view that “serverless” means “can scale to zero” is widely held in the field, and many take Brown’s statement with some skepticism (since Lambda Managed Instances don’t scale to zero AWS asserts that scaling to zero isn’t part of ‘serverless’).
Storage decisions increasingly influence how responsive and usable AI systems feel. The keynote’s coverage of Amazon S3 Vectors reinforced that retrieval speed, durability, and simplicity must coexist.
Vector-heavy workloads stress storage systems in new ways, especially when retrieval sits directly in the user interaction path. Amazon S3 Token Neighborhoods are internal Amazon S3 Vectors optimizations that cluster related token embeddings, enabling faster approximate nearest‑neighbor search over huge Amazon S3‑resident corpora, particularly for AI agents and RAG. AWS’s introduction of token neighborhoods signals a recognition that improving locality and access patterns can materially change performance characteristics while preserving Amazon S3’s operational strengths.
As AI architectures mature, storage stops being a passive repository and becomes an active participant in system behavior.
Across compute, serverless, and storage, the same principle kept surfacing: fundamentals scale in importance as systems become more complex. AI systems are less forgiving of weak assumptions because they operate continuously, adaptively, and often autonomously.
Strong infrastructure foundations provide stability in an environment defined by rapid change. They give teams room to experiment, confidence to deploy, and clarity when something goes wrong. Without that foundation, even the most sophisticated models struggle to deliver durable value.
Building on AWS today means navigating a landscape where new abstractions emerge quickly, but foundational decisions still determine long-term success. Teams need partners who understand how to balance innovation with operational rigor.
At Caylent, we help organizations design and build AWS architectures that respect those fundamentals while embracing what’s next. Whether that’s large-scale AI, agentic systems, or modern application platforms, Caylent has brought hundreds of complex AI systems to production. These are challenging and exciting times to build, and having a guide can make the journey both faster and more reliable.
We’re excited to see what you build next, and we’re here to help you get there.
Brian is an AWS Community Hero, Alexa Champion, has ten US patents and a bunch of certifications, and ran the Boston AWS User Group for 5 years. He's also part of the New Voices mentorship program where Heros teach traditionally underrepresented engineers how to give presentations. He is a private pilot, a rescue scuba diver and got his Masters in Cognitive Psychology working with bottlenosed dolphins.
View Brian's articlesCaylent Catalysts™
Accelerate investment and mitigate risk when developing generative AI solutions.
Caylent Catalysts™
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Leveraging our accelerators and technical experience
Browse GenAI OfferingsLearn about all of the AI announcements unveiled at AWS re:Invent 2025, including the Amazon Nova 2 family, updates to Amazon Bedrock AgentCore, Kiro autonomous agents, Amazon S3 Vectors, and more.
Explore our technical analysis of AWS re:Invent 2024 price reductions and performance improvements across DynamoDB, Aurora, Bedrock, FSx, Trainium2, SageMaker AI, and Nova models, along with architecture details and implementation impact.
Get up to speed on all the GenAI, AI, and ML focused 300 and 400 level sessions from re:Invent 2023!