re:Invent 2023 Storage Session Summaries

December 12, 2023

AWS Announcements

Serverless & Containers

Get up to speed on all the storage focused 300 and 400 level sessions from re:Invent 2023!

We know that watching all the re:Invent session videos can be a daunting task, but we don't want you to miss out on the gold that is often found in them! In this blog, you can find quick summaries of all the 300 and 400 level sessions, grouped by track. Enjoy!

STG311 AWS storage for serverless application development

The AWS re:Invent 2023 session, titled "AWS Storage for Serverless Application Development (STG311-R)," featured speakers Bryan Liles, Senior Principal Engineer at Amazon S3, Sebastien Berube, a Manager of Specialist Solution Architecture at AWS, and Jefferson Frazer, Senior Director at Shutterstock. The session primarily focused on how AWS storage solutions, particularly Amazon S3 and Amazon Elastic File System (EFS), can be effectively utilized for serverless application development. They emphasized the need for proper access to data when using serverless technologies and discussed how AWS storage options can cater to different needs in serverless architecture.

Bryan and Sebastien detailed the advantages of serverless architecture, including rapid scalability, cost efficiency, and focus on application development rather than infrastructure management. They highlighted the key differences and use cases for Amazon S3 and Amazon EFS, with S3 providing high scalability and parallelism for object storage, and EFS offering low latency and a file-based system suitable for workloads needing frequent file access and updates. Real-world use cases were explored, illustrating how businesses leverage these AWS storage solutions for efficient, scalable, and cost-effective serverless application development.

Jefferson Frazer from Shutterstock shared insights into how his company employs AWS serverless services and storage for handling their vast and growing digital asset library. He explained Shutterstock's transition towards serverless processes, emphasizing the importance of minimizing operational toil and maximizing developer focus on core business challenges. Jefferson described how they use AWS technologies like Lambda, Step Functions, EventBridge, and S3 batch operations to efficiently manage and deliver their content. This practical example underscored the session's theme of leveraging AWS's serverless storage solutions to handle large-scale, dynamic, and complex workloads efficiently.

AWS re:Invent 2023 - AWS storage for serverless application development (STG311-R)

STG313 Building and optimizing a data lake on Amazon S3

The AWS re:Invent 2023 session, "Building and Optimizing a Data Lake on Amazon S3 (STG313)," presented by Oleg and Huey from Amazon S3's engineering and product management teams, and guest speaker Ryan Blue, co-creator of Apache Iceberg, focused on the development and optimization of data lakes using Amazon S3. Oleg discussed various aspects of Amazon S3, emphasizing its scalability and how it thrives at scale, impacting decisions made in Amazon S3 architecture. He highlighted the importance of storage layer in data lakes, the growth of data sets and user bases, and the challenges in scaling. Oleg also introduced the new S3 Express One Zone for high TPS and low latency workloads and discussed AWS Common Runtime for performance optimization in data lake connectors.

Huey shifted the focus to cost optimization and data governance in scaling data lakes. He discussed the use of various S3 storage classes, the role of S3 Lifecycle for transitioning data between storage tiers, and the benefits of S3 Intelligent-Tiering for automated cost savings. Huey emphasized the importance of visibility in data usage for cost optimization, recommending tools like Storage Lens. He also introduced S3 Access Grants for scalable S3 data governance, allowing direct S3 access to users and groups in corporate directories, enhancing security and auditing capabilities.

Ryan Blue provided insights into Apache Iceberg, a project aimed at addressing the shortcomings of the traditional hive table format. He discussed how Iceberg, designed specifically for object stores like S3, avoids operations that are inefficient on such platforms. Iceberg facilitates a unified data architecture, allowing various systems like EMR, Trino, and Spark to effectively use the same data stored in S3. He shared use cases from Netflix, showcasing significant performance improvements and cost reductions achieved with Iceberg. Ryan concluded with the future outlook for data architecture, emphasizing the need for more open standards and the evolving modular nature of databases and analytics tools in relation to centralized data storage like S3.

AWS re:Invent 2023 - Building and optimizing a data lake on Amazon S3 (STG313)

STG314 Dive deep on Amazon S3

This transcript from AWS re:Invent 2023 features a deep dive into Amazon S3, presented by Amy Therrien, Director of Engineering, and Seth Markle, Senior Principal Engineer. They begin by discussing S3's evolution over 17 years, highlighting the team's dedication to understanding and improving S3's resilience. They emphasize the value of experience in refining their operation and building models for object storage in the cloud. The presentation delves into S3's early challenges and the shift from reactive to proactive threat modeling. This approach involves anticipating potential issues and implementing mitigations before they occur, fostering a culture of proactive thinking and problem-solving within the S3 team.

Amy and Seth then explain S3's technical aspects, focusing on the importance of optimizing storage and indexing for scalability and performance. They discuss strategies like multipart uploads and range gets for improving data transfer efficiency. The talk also covers the design of S3's indexing system, which handles over 350 trillion objects and 100 million requests per second. They emphasize the role of key naming and prefix management in maximizing the system's scalability. The discussion includes practical advice on how customers can leverage S3's architecture, such as using diverse character sets in key names and keeping dates to the right in key strings to prevent throttling and ensure efficient distribution of requests across the system.

The latter part of the presentation is handled by Seth, who focuses on data durability and availability in S3. He explains how S3 achieves its 11 nines of durability through end-to-end checksumming, erasure coding, and constant monitoring. The system's design allows for rapid recovery of data from drive failures and bit errors, with data redundantly stored across multiple devices and zones. Seth also introduces S3 Express One Zone, a new storage class designed for high-performance applications, which trades off some durability for speed by localizing data in a single availability zone. The session concludes with insights into S3's operational practices, such as deploying code changes and managing control plane limits, underscoring the team's commitment to robust and resilient service.

AWS re:Invent 2023 - Dive deep on Amazon S3 (STG314)

STG315 Amazon S3 security and access control best practices

The AWS re:Invent 2023 conference featured a session focused on Amazon S3 security and access control best practices. The session, led by AWS employees Meg and Becky, emphasized the importance of securing data stored in Amazon Simple Storage Service (S3), highlighting its ubiquitous use in AWS for storing a wide variety of data, including data lakes, logs, infrastructure data, and content for customers. They stressed the necessity of ensuring that only authorized parties have access to the stored data, aligning with AWS's priority on security.

During the session, Meg and Becky introduced several best practices and security controls for Amazon S3. They discussed recent changes made by AWS, such as enabling encryption by default for new objects in S3 buckets and turning off Access Control Lists (ACLs) for new buckets. These changes enhance security by providing foundational security settings. They also touched on the use of AWS Key Management Service (KMS) for additional encryption and control over access to data. The session covered the use of S3 bucket policies for fine-grained access control and the importance of monitoring and auditing access to data with tools like AWS CloudTrail and S3 server access logs.

Finally, the session delved into advanced IAM policy authoring, discussing the creation of secure defaults and strategies for scaling access control for large-scale data lakes. They explained how to utilize S3 access points and the new feature of access grants for more granular control over data access. The presentation also addressed how to create a secure data perimeter using IAM policies to ensure that data is not unintentionally shared beyond the organization's boundaries. This comprehensive approach to S3 security and access control highlighted AWS's commitment to providing robust and scalable security solutions for cloud storage.

AWS re:Invent 2023 - Amazon S3 security and access control best practices (STG315)

STG319 Beyond 11 9s of durability: Data protection with Amazon S3

In the AWS re:Invent 2023 session titled "Beyond 11 9s of Durability: Data Protection with Amazon S3 (STG319)," Yi Zarubin, a principal engineer at S3, and Ankita Mishra, a senior product manager, discussed Amazon S3's approach to data durability and protection. They emphasized S3's culture of durability, which is ingrained over its 17 years of service, and how it influences their development of algorithms and architectures to ensure data safety. Key mechanisms like the 'durability review' were highlighted, where changes to the system undergo rigorous assessments to safeguard against potential threats to data durability.

The presentation covered various threats to data stored in S3, such as hardware failure and data corruption, and the strategies employed to mitigate these risks, like redundant storage across multiple devices and end-to-end integrity check sums. They also discussed S3 features such as versioning, which helps in protecting against accidental deletions and overwrites. Versioning ensures that every upload is assigned a unique version ID, allowing users to retrieve previous versions of an object. The session also included insights into S3's replication capabilities, including cross-region replication for geographical redundancy and compliance with certain regulatory requirements.

Finally, the session touched upon additional S3 features that enhance data protection and recovery. These include S3 Object Lock for immutable storage, multi-region access points for automated failover and business continuity, and S3 Storage Lens for analytics and visibility into storage usage and activity. The presenters highlighted how these features, along with S3's foundational durability and replication mechanisms, provide comprehensive protection and recovery solutions, ensuring that user data is not only stored securely but also remains accessible and recoverable under various circumstances.

AWS re:Invent 2023 - Beyond 11 9s of durability: Data protection with Amazon S3 (STG319)

STG322 Modernize managed file transfer with SFTP

The AWS re:Invent 2023 session on Amazon EBS (Elastic Block Store) focused on achieving high-performance consistency. Marc Olson, a senior principal software engineer at AWS, and Vienna Chen, a principal product manager, led the session. They emphasized the importance of understanding storage performance in the context of overall system performance, noting that aspects such as application design and database transactions significantly influence storage performance. They introduced Little's Law as a theoretical framework to understand system concurrency and capacity. Using airline traffic as an analogy, they explained how traffic management principles apply to storage performance, highlighting the role of queuing and contention in system throughput.

The session delved into technical specifics of Amazon EBS, discussing various volume types and their performance characteristics. The presenters introduced io2 Block Express and gp3 as two key EBS volume types, noting their differences in latency and IOPS (Input/Output Operations Per Second). They stressed the importance of tail latency in understanding volume performance, showing how io2 Block Express provides superior performance consistency compared to other types. The talk also covered the advancements in AWS’s data plane communication protocol, moving from TCP to SRD (Scalable Reliable Datagram) for lower latency, higher throughput, and faster recovery around failures.

Towards the end, the focus shifted to practical applications, specifically database storage on EBS. The speakers provided insights into choosing the right EC2 instance and EBS volume type for database applications, considering factors like memory-to-CPU ratio and IOPS requirements. They highlighted the benefits of using NVME reservations on io2 volumes for SQL Server databases, enabling better performance and data integrity. The session concluded with discussions on NoSQL databases, comparing the use of EC2 instance store versus EBS volumes, emphasizing factors such as performance, durability, backup strategies, and cost implications in different scenarios.

AWS re:Invent 2023 - Modernize managed file transfer with SFTP (STG322)

STG331 Achieve high performance consistency with Amazon EBS

In this AWS re:Invent 2023 session titled "Achieve High Performance Consistency Using Amazon EBS (STG331)," Marc Olson and Vienna Chen from the Amazon EBS team provided an in-depth exploration of optimizing storage performance on Amazon EBS. They discussed key concepts such as Little's Law for understanding system capacity, the importance of storage performance within the broader system performance, and how various EBS volume types cater to different performance needs. They highlighted the need for carefully selecting the right EC2 instances and EBS volumes based on performance requirements and demonstrated how CloudWatch metrics can be utilized to monitor performance at the volume, instance, and application levels.

The session also included a discussion on the intricacies of I/O operations and how AWS's Nitro system and SRD (Scalable Reliable Datagram) protocol enhance EBS's performance. The SRD protocol's design allows for lower latency, higher throughput, and faster recovery from failures, leading to significant improvements in EBS's performance. The presenters elaborated on the evolution of EBS, from simple chain replication to a more complex sharding with different replication schemes, enhancing both performance and durability.

Finally, the session delved into practical applications, focusing on considerations for running SQL and NoSQL databases on EBS. For SQL databases, options like Always On Availability Groups and SQL Server Failover Cluster Instances were discussed, including the recent support for NVME reservations on io2 volumes. For NoSQL databases, considerations for using EC2 instance store versus EBS volumes were examined, highlighting the trade-offs in terms of performance, durability, and cost. The presenters emphasized the importance of understanding and benchmarking workloads to optimize performance, durability, and cost according to specific application requirements.

AWS re:Invent 2023 - Achieve high performance consistency using Amazon EBS (STG331)

STG337 Solving large-scale data access challenges with Amazon S3

The presentation at AWS re:Invent 2023 focused on addressing large-scale data access challenges using Amazon S3. Rob Wilson, a product manager on the S3 team, and Becky Weiss, a VP and distinguished engineer on AWS IAM, introduced the new feature called S3 Access Grants. They began by discussing the foundational concepts of permissions and access control in S3, including prefix-based access, access points, and structured data considerations. They highlighted the importance of securing data while allowing appropriate access, emphasizing best practices like turning on block public access and disabling access control lists.

As the session progressed, Rob and Becky delved into more complex access control scenarios. They explained how IAM roles can be used to grant access to S3, but noted the limitations in terms of scalability and flexibility when dealing with numerous users and access patterns. To address this, they introduced the concept of a session broker pattern for dynamic access control decisions, which uses short-term credentials from the AWS Security Token Service (STS). This approach allows for more granular and dynamic control, scaling based on the number of credentials rather than the volume of S3 requests.

The highlight of the session was the introduction of S3 Access Grants, a new feature designed to simplify and improve the management of access control in large-scale data environments. Becky Weiss explained how S3 Access Grants work in conjunction with IAM and integrate with IAM Identity Center, allowing direct granting of access to data in S3 to both IAM principals and directory users. This feature aims to alleviate common challenges in managing large data lakes, such as the cumbersome management of bucket policies, ensuring consistent access mapping, and simplifying audit trails. S3 Access Grants offer a more natural and direct way to express access mappings and are fully compatible with existing AWS services and encryption methods like KMS. The session concluded with an invitation for further discussion and questions from the audience.

AWS re:Invent 2023 - Solving large-scale data access challenges with Amazon S3 (STG337)

STG338 Scale analytics and SaaS applications and serverless elastic storage

In this presentation at AWS re:Invent 2023, Alex Martineau, General Manager for Elastic File System at AWS, along with Perry Jannette, a senior solutions architect for file storage, and guest Dave Smith, AWS Lead Architect at SAP Enterprise Cloud Services, discussed the scaling of analytics and SaaS applications with serverless elastic storage. The session focused on file storage for SaaS and analytics workloads, addressing common pain points and solutions, and showcasing real customer use cases, including one from SAP. They emphasized shared file storage requirements for these applications due to concurrent access needs and scale requirements that vary with customer behavior or business needs in analytics. The discussion also highlighted the importance of cost-effective and resilient systems, capable of handling long-tail data over extended timelines.

The speakers discussed the use of Amazon Elastic File System (EFS) in various scenarios, highlighting its features like high availability, data durability, security, and seamless integration with AWS services. Dave Smith shared insights into SAP's use of EFS, outlining its benefits in terms of scalability, reliability, performance, and cost efficiency. The presentation also covered recent performance improvements in EFS, such as elastic throughput, which allows for scalable performance based on usage, and new archive storage class for colder data at lower costs. These enhancements enable customers like SAP to efficiently manage large-scale deployments across multiple AWS regions.

Perry Jannette delved deeper into the technical aspects of EFS, emphasizing its configuration-free nature, high availability, and security features. He explained how EFS supports various AWS compute services, providing persistent storage that remains consistent as application architectures evolve. Jannette also discussed the cost optimization aspects of EFS, including its different storage classes like Standard, Infrequent Access, and Archive, which help manage data life cycles efficiently. The session concluded with a focus on delivering insights faster for analytics and driving faster time to value for SaaS applications, underscoring the continuous improvement and ease of consumption of AWS services.

AWS re:Invent 2023 - Scale analytics and SaaS applications with serverless elastic storage (STG338)

STG340 Accelerate ML and HPC with high performance file storage

The AWS re:Invent 2023 session titled "Accelerate ML and HPC with high performance file storage (STG340)" was led by Eric Anderson, the general manager of FSx for Lustre and Amazon File Cache, along with his colleagues Darryl Osborne and Laura Shepard. The presentation focused on the capabilities of Amazon FSx for Lustre and FSx for ZFS in handling high-performance computing and machine learning workloads. Emphasis was placed on the need for storage solutions that can keep pace with virtually unlimited high-performance computing demands without becoming a bottleneck. They discussed how AWS Storage services, particularly FSx file systems, cater to various needs ranging from traditional IT applications to compute-intensive workflows and cloud-native applications.

Laura Shepard discussed major features of FSx for Lustre, highlighting its integration with Amazon S3 data lakes, its scalable throughput, and options to optimize costs. She emphasized how FSx for Lustre's architecture, using scale-out file systems, provides high scalability, supporting hundreds of gigabytes per second of throughput and millions of IOPS, ideal for high-performance workloads. Laura provided customer examples, including Netflix's use of FSx for Lustre in their ML media models, to demonstrate the efficiency gains from a fast and scalable file storage system.

Darryl Osborne presented a demo showcasing FSx for Lustre's integration with Data Lakes to create a fast file interface for processing S3 data. He demonstrated how FSx for Lustre handles large-scale data by importing, exporting, and efficiently managing files between AWS services. The demo included creating and modifying files, illustrating the speed and flexibility of FSx for Lustre in handling data-intensive tasks. Darryl's demonstration highlighted how FSx for Lustre's architecture and integration with S3 optimize high-performance computing and machine learning workloads, offering scalable solutions to AWS customers.

AWS re:Invent 2023 - Accelerate ML and HPC with high performance file storage (STG340)

STG341 Meet performance demands for your business-critical applications

In an AWS re:Invent 2023 session, Nikhil and Andy, heads of product on the Amazon FSx team, along with Mani Madhunapanthula from Arcesium, discussed the capabilities of Amazon FSx in addressing performance demands for business-critical applications. They highlighted the challenges businesses face with growing data volumes and the need for efficient, scalable, and reliable storage solutions. The session delved into the importance of high availability, performance, and cost management for applications such as core databases and customer-facing services. They emphasized that despite the variety of applications across industries, most seek to migrate to the cloud for improved resiliency, performance, and cost efficiency.

Amazon FSx was presented as a solution that offers fully-managed, high-performance storage powered by popular file system technologies like NetApp ONTAP, Windows File Server, OpenZFS, and Lustre. The speakers highlighted FSx’s ability to provide like-for-like storage capabilities, making cloud migration easier for customers. They discussed how FSx helps in improving resiliency with features like Multi-AZ support, data replication, and fast recovery options. For performance optimization, FSx offers scalability and the ability to adjust resources as per application demands. Cost optimization is achieved through data deduplication, compression, and efficient architectural designs.

Specific use cases of Amazon FSx were explored, including clustered database workloads for SQL Server, Oracle, and SAP HANA, and virtual machine data stores for VMware environments. The session covered how FSx’s features like snapshots, clones, and quality of service policies significantly improve performance and reduce costs. Mani from Arcesium shared their journey with FSx ONTAP, detailing how it helped them achieve storage efficiencies, cost reductions, and enhanced performance for platform refreshes. The session concluded with an emphasis on continuous improvement and collaboration to enhance customer experiences and service offerings.

AWS re:Invent 2023 - Meet performance demands for your business-critical applications (STG341)

STG350 Get started with checksums in Amazon S3 for data integrity checking

In this session at AWS re:Invent 2023, Aritra Gupta, a senior product manager at Amazon S3, presented on the topic of checksums in Amazon S3 for data integrity checking. He began by highlighting the staggering number of checksum validations (over 4 billion per second) performed by Amazon S3, emphasizing the importance of data integrity. The session covered an introduction to checksums, including their definition, calculation methods (such as CRC32), and key use-cases like data in transit and data at rest. Aritra explained that Amazon S3 uses a combination of CRC, SHA-256, and MD5 algorithms for data validation in transit and at rest, and the session was set to explore advanced checksum capabilities and their applications.

Aritra then delved into the advanced integrity checking capabilities in Amazon S3, including trailing checksums and parallel checksum operations. He explained the efficiency gains from these features, especially for streaming workloads and large objects. Aritra discussed the choice of checksum algorithms, which should be based on specific use cases, such as CRC32C for fast, lightweight operations or SHA-256 for compliance requirements like in genomics data. He emphasized the flexibility Amazon S3 offers in choosing these algorithms.

The session concluded with a demonstration of how checksums work in Amazon S3 using the Python client and BOTO SDK. Aritra showcased the process of calculating checksums locally, uploading objects with specified checksums, and handling mismatched checksums. He also demonstrated the GetObjectAttributes API, particularly beneficial for large objects, and explained the computation of the checksum of checksums for multipart uploads. The presentation emphasized the performance benefits of Amazon S3’s checksum capabilities, particularly for large-scale data integrity checks.

AWS re:Invent 2023 - Get started with checksums in Amazon S3 for data integrity checking (STG350)

STG358 Optimizing performance for machine learning training on Amazon S3

AWS re:Invent 2023 featured a session on optimizing machine learning training performance using Amazon S3, particularly emphasizing the use of S3 for machine learning cases. Devabrat Kumar, a product manager on the Amazon S3 team, highlighted the advantages of using Amazon S3 for machine learning training, including its scalability, high throughput, and integration with services like Amazon SageMaker AI and Bedrock. He introduced the new S3 Express One Zone storage class designed for low-latency, performance-heavy workloads, ideal for machine learning training with random data access. The session also covered the use of Amazon S3's storage classes and cost optimization features.

Alexander Arzhanov, a specialist solution architect at AWS, discussed the machine learning lifecycle stages and the importance of optimizing data IO performance. He explained the differences between sequential and random read patterns, emphasizing the performance implications of each. Arzhanov introduced Amazon S3 Express One Zone, a new storage class for low-latency access, beneficial for training jobs with high random data access. This class delivers significant improvements in terms of first-byte latency and overall throughput, making it suitable for latency-sensitive workloads.

James, another speaker, focused on simplifying the use of S3 for machine learning training, particularly outside of SageMaker AI. He introduced Mountpoint for Amazon S3, a file client that presents S3 buckets as local file systems, and the Amazon S3 connector for PyTorch. These tools offer high performance and integrate seamlessly with training frameworks like PyTorch. James highlighted the importance of high-performance checkpointing and data loading, and the role of AWS Common Runtime in optimizing S3 transfers. The session concluded with an invitation for further discussion and questions, emphasizing AWS's commitment to streamlining machine learning workflows with efficient data storage solutions.

AWS re:Invent 2023 - Optimizing performance for machine learning training on Amazon S3 (STG358)

Conclusion

These are summaries of all the 300 and 400 level Storage sessions. We hope you found these helpful in both getting an overview of the new storage content as well as deciding which sessions to go watch.

AWS Announcements

Serverless & Containers

Brian Tarbox

Brian is an AWS Community Hero, Alexa Champion, runs the Boston AWS User Group, has ten US patents and a bunch of certifications. He's also part of the New Voices mentorship program where Heros teach traditionally underrepresented engineers how to give presentations. He is a private pilot, a rescue scuba diver and got his Masters in Cognitive Psychology working with bottlenosed dolphins.

View Brian's articles

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Introduction to Amazon States Language

Learn how Amazon States Language (ASL) enables seamless orchestration of Lambda functions, making it easier to chain operations and manage workflows efficiently.

Serverless & Containers

December 23, 2024

AWS re:Invent 2024 Price Reductions and Performance Improvements

Explore our technical analysis of AWS re:Invent 2024 price reductions and performance improvements across DynamoDB, Aurora, Bedrock, FSx, Trainium2, SageMaker AI, and Nova models, along with architecture details and implementation impact.

AWS Announcements

Generative AI & LLMOps

Serverless & Containers

December 14, 2023

re:Invent 2023 AI/ML Session Summaries

Get up to speed on all the GenAI, AI, and ML focused 300 and 400 level sessions from re:Invent 2023!

AWS Announcements

Analytical AI & MLOps

Generative AI & LLMOps

View all blog posts

STG311 AWS storage for serverless application development

STG313 Building and optimizing a data lake on Amazon S3

STG314 Dive deep on Amazon S3

STG315 Amazon S3 security and access control best practices

STG319 Beyond 11 9s of durability: Data protection with Amazon S3

STG322 Modernize managed file transfer with SFTP

STG331 Achieve high performance consistency with Amazon EBS

STG337 Solving large-scale data access challenges with Amazon S3

STG338 Scale analytics and SaaS applications and serverless elastic storage

STG340 Accelerate ML and HPC with high performance file storage

STG341 Meet performance demands for your business-critical applications

STG350 Get started with checksums in Amazon S3 for data integrity checking

STG358 Optimizing performance for machine learning training on Amazon S3

Conclusion

Brian Tarbox

Accelerate your cloud native journey

Related Blog Posts

Introduction to Amazon States Language

AWS re:Invent 2024 Price Reductions and Performance Improvements

re:Invent 2023 AI/ML Session Summaries