Cloud Native App Dev
Deliver high-quality, scalable, cloud native, and user-friendly applications that allow you to focus on your business needs and deliver value to your end users faster.
For years, re:Invent has continued to grow in popularity. It’s a fantastic experience complete with exciting venues, fun events, the best of social networking, and yet still, the service and feature announcements always stand out as the biggest take-away. Well, that and the swag.
It’s day three of re:Invent 2022 and Swami Sivasubramanian, the VP of Data and Machine Learning, takes the stage to share some exhilarating announcements. You might have missed your morning coffee since you got in line early to make sure you secured a seat at this must see keynote. Not to worry though, the energy from this keynote easily replaces your morning caffeine fix. After the dramatic opening you’re fired up and ready to take in the abundance of announcements and insight Swami will share over the next two hours.
Swami opens by diving into the creative process and how humans come up with inventions and innovations. He explains that at first glance inventions and innovations are pulled out of thin air, like a lightbulb suddenly flicking on in your head. However, these ideas are a culmination of the hundreds of thousands of data points your brain takes in over your lifetime. Creativity follows a cumulative process. The human mind shows us how we can harness the power of data to drive creativity, and we can apply that same process to organizations.
The creative process for organizations, while similar in some respects, is much more difficult. As Conway’s Law dictates, organizations tend to create systems that reflect their own social and hierarchical boundaries instead of necessarily creating the best streamlined solution. For organizations data doesn’t naturally flow and isn’t stored in one centralized location like the brain; it comes from all over, it’s siloed, and isn’t easily visualized. That’s where AWS comes in. AWS provides modular pipelines and mechanisms to process data and get it to those who need it to drive invention and innovation for any organization. When all of these elements come together, new products and customer experiences come to life.
Building on the previous points raised, Swami drives home the idea that an organization’s data and processing it in a meaningful way is paramount to invention, or to put it more plainly, an organization’s success. Without invention and innovation, shifting to meet the current landscape, businesses will not thrive. Swami makes it clear that AWS is the best place to make sense of this data and use it to propel your business. He highlighted that AWS received a 95 out of 100 in the Gartner solution scorecard for dbPaaS (database as a service). This is why more than 1.5 million customers come to AWS for their data needs.
Swami breaks down the core elements of a data strategy into three easy-to-understand points. Build future-proof foundations supported by core data services, weave connective tissue across your organization, and democratize data with tools and education. Throughout the rest of the presentation he continuously ties his points back to these three pillars that make up a strong data strategy.
All organizations are different and AWS understands that, which is why Swami drives home the idea that you need your solution to be purpose-built. That’s why AWS is so powerful, it doesn’t provide a single cookie-cutter solution but instead gives you all the tools you need to build a custom solution that fits your business perfectly. After hammering on the needs for custom solutions, Swami makes his first new product announcement, Amazon Athena for Apache Spark.
He gives a comprehensive overview of this new service and shares the following points: it allows you to start running interactive analytics on Apache Spark in just one second, allows users to spin up Spark workloads up to 75% faster than other serverless offerings, users can build Spark applications with a simplified notebook interface in the Athena console, and it’s 3x faster than open-source Spark. Athena for Spark is deeply integrated with other AWS services like SageMaker and EMR so you can easily query data from various sources, chain it, and visualize results. Lastly, there is no underlying infrastructure you need to manage, and you only pay for what you use. With all of these new capabilities in mind, it’s clear that AWS is the best place to run Apache Spark in the cloud.
As the world has become more and more reliant on technology, data has grown exponentially. That’s why it’s extremely important that an organization’s data tools are able to perform at a massive scale. To handle the ever-increasing dependency on data processing, AWS has created a new offering; Amazon DocumentDB Elastic Clusters. Elastic Clusters automatically scale to petabytes of storage alongside almost any read and write capacity in minutes with no downtime or performance impacts. In true AWS fashion, it is a managed service meaning there is no underlying infrastructure for you to take care of as the user. It scales automatically and takes that burden out of your hands. This announcement gets the crowd riled up as they see the immense potential in this fantastic new service offering.
We’ve already covered the strides AWS is making towards pushing performance at scale, but now it’s time to cover how they are removing heavy lifting from their end users. The scenario Swami highlighted is working with geospatial data which is typically unstructured and made up of huge data sets coming from a multitude of different sources. Needless to say, working with geospatial data, especially for training ML models, came with a considerable overhead. That is now a thing of the past with Swami’s next big announcement, AWS SageMaker now supports geospatial ML! It allows users to acquire data from multiple sources already integrated into SageMaker with just a few clicks, prepare it using built-in geospatial algorithms, run it through pre-trained neural networks, and make decisions with built-in visualizations. This announcement won’t just benefit technologists, removing tons of heavy lifting previously necessary to run ML on geospatial data, but the world as a whole. One example Swami shared is how SageMaker could be used to determine which roads would be affected by a flood, showing first responders the best routes to deliver aid, send emergency supplies, and evacuate survivors.
Swami wasn’t done there. He emphasized AWS’s commitment to investing in a zero ETL future, which means eliminating manual data pipeline creation. With this he announced a feature update, Aurora zero-ETL integration with Amazon Redshift. It automatically copies data with a single command into Redshift. Additionally, he announced Redshift auto-copy from S3 which automates data loading without engineering resources. With AWS’s zero ETL mission, they are tackling data sprawl and making it easier for the end user to connect with their data source so they can focus on processing that data rather than wasting resources just getting the data in one place.
Performance at scale and removing heavy lifting is great, but does that even matter if your data isn’t reliable and secure? You need the right safeguards in place to protect your data and AWS has your back. AWS has a long history of building secure and reliable services for your data, but customers felt that they needed the same level of reliability in their analytics applications as databases like Aurora and DynamoDB. That’s why Swami was proud to meet their customers’ call to action with the announcement of Amazon Redshift Multi-AZ. Redshift Multi-AZ guarantees automatic failover in the unlikely event that an AZ is disrupted, it processes reads and writes across multiple AZs to maximize return on investment, and maintains business continuity without application changes. This announcement is one of the many ways AWS has been innovating the cloud to make sure their customers’ data is secure and reliably accessible.
Building off that energy, Swami rocked the crowd again with 2 follow-up announcements. Trusted Language Extensions for PostgreSQL and GuardDuty RDS Protection. Trusted Language Extensions is an open-source project that allows users to safely use extensions that meet their needs without waiting for AWS certification. It leverages and gives back to the open-source community, which is fantastic. With GuardDuty for RDS, AWS is leveraging ML to find threats like access attacks in your RDS. It delivers detailed findings enriched with contextual data, and all that info is consolidated at the enterprise level for the customer. Did I mention all of that goodness is enabled with a single click? With all these fantastic new features it’s clear that AWS is placing a great emphasis on reliability and security and really listens to customer feedback. They’re using data to drive innovation and business decisions that impact their customers, showing firsthand how important it is to leverage data in your organization.
We just touched on how important it is to make sure your data is reliably accessible and secure, but let’s not forget how important it is to make sure that safe kept data isn’t riddled with bad data. If your good data is mixed with bad data, your insights will be skewed and potentially detrimental to the organization. As Swami explained, we need quality rules for our data so our data lake doesn’t turn into a data swamp. Using this analogy, Swami announced another amazing feature update, AWS Glue Data Quality. It takes away the huge overhead of creating data quality rules, generating them automatically to increase freshness and accuracy of data. These rules can be applied to pipelines so bad data doesn’t need to be filtered from your data lake. Bad data simply won’t make its way into your database to begin with. That way the user can trust that they have a reliable data set that is ready to go, and on the off chance that bad data does make its way in, Glue Data Quality will automatically alert them so manual action can be taken to rectify the situation. This shows that AWS is driven to helping their customers get meaningful insights from their data to truly positively impact their business.
With all the data organizations are amassing, it only makes sense to perform machine learning on that data. Humans couldn’t possibly reliably sort through all that data manually, at least in a fast enough time to drive business decisions. However, with machine learning comes a slew of different challenges, one of the largest being governance. ML requires collaboration between many users, it’s time-consuming to set up permissions for each user group, and it’s difficult to share models and data in one centralized location. To combat this problem, AWS released an update to SageMaker’s ML Governance. It introduces Role Manager, Model Cards, and Model Dashboard which all act to clear up these governance issues for the end user.
All that we’ve talked about thus far is great, but if we don’t have a continuing workforce to leverage these tools all will be for naught. AI alone is estimated to add a million jobs by 2029 and those numbers are only going to grow as more technologies emerge. That’s why AWS is paving the way for future technologists and placing emphasis on educating and training. To address the diversity issue that the tech field as a whole is facing, they are directing this effort at underserved communities and minority serving institutions. This is especially important to Swami as he came from an underserved village in India with only one computer for the entire high school. He doesn’t want others to face these challenges and wants to make data engineering and machine learning a possible career path for all. That’s why he was ecstatic to announce that AWS Machine Learning University now provides educator training.
By 2023, AWS will train over 350 educators from community colleges all over the US. Educators will be equipped to provide students with AI ML courses, certificates, and degrees. An early version of this program was brought to Houston Community College, the first community college accepting this coursework for a bachelor’s degree. Additionally, AWS AI and ML scholarship program awarded 10 million dollars of scholarship to 2000 students and doesn’t plan on stopping there. Overall AWS is making a great push to make tech more accessible and increase diversity in tech. These are all things I think we should champion and push to see more of from other organizations. Hats off to AWS.
Through his keynote presentation, Swami made a lot of announcements and highlighted a lot of the great things AWS has to offer in the Data and ML space. He made it apparent that AWS listens to their customers, wants the best for them, and wants the best for our entire shared world. The advancements AWS has made in the Data and ML space in just the last year are immense. With his closing, Swami urged his audience to get out there and make the next big thing. I have to take his lead and say the same. Get out there and leverage these tools, do something big for your organization, yourself, the world. Remember, a spark starts with one.
Tristan is a Senior Cloud Software Engineer at Caylent, helping customers build the most performant user-centric cloud-native applications on AWS. Tristan grew up in Milwaukee, Wisconsin, and graduated from Milwaukee Area Technical College. He holds both AWS Solutions Architect Associate and AWS Certified Developer Associate certifications. When he isn’t developing top-tier applications for our customers, he enjoys traveling, tinkering with cars, getting outside with his dog, and his freshwater aquariums.View Tristan's articles
Migrating to the cloud is deeply desirable due to ease of the management, scalability and many other factors, however poor choices in the migration process can lead to increased costs, poor performance and tech debt. Learn about the top 7 cloud migration mistakes and how to avoid them.
Supercharge AWS Lambda cold start times by up to 90% by leveraging AWS Lamda SnapStart and Firecracker, helping you minimize latency without any additional caching costs.