Machine Learning Operations on AWS

April 17, 2023

Analytical AI & MLOps

Data Modernization & Analytics

Learn how Machine learning operations (MLOps) - deployment steps and disciplines that help you deploy ML models reliably, and efficiently - maximize accuracy, repeatability and the productivity of your ML resources.

Machine Learning (ML) adoption is a strategic goal in most businesses today across industries, from born-in-the-cloud digital disruptors to traditional manufacturers, services, and everything in between. Making effective use of ML can be a significant competitive advantage, unlocking latent value in data assets from startups to enterprises. Because demand for data scientists is outpacing supply at the moment, a prudent approach to the adoption of ML can be to optimize with these scarce professionals’ productivity in mind.

MLOps promises to maximize data scientist productivity as well as model effectiveness and reliability, adding discipline to the development and deployment of ML models through repeatable processes and automation, and reducing repetitive and mistake-prone toil.

For those newer to ML, a good backgrounder is Machine learning, explained from MIT Sloan school’s website, which includes a succinct summary: “Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed”. This quote subtly encapsulates the premise of this article – if ML isn’t explicitly programmed like traditional algorithms, then the components of the system outside of the model itself must be major factors in the effectiveness of the insights to be derived.

In recent years, the MLOps concept has begun to do for ML what DevOps promises for applications – increase the rate of predictable change through automation and collaboration. Just as innovation in managed services for release automation and infrastructure provisioning on AWS have accelerated development team velocity, the Amazon SageMaker AI family and a growing library of AWS managed ML services are accelerating ML and MLOps adoption, some of which Caylent has written about previously when discussing data modernization.

ML implementations benefit from repeated experimentation and iteration. MLOps facilitates model testability and comparison by holding other variables in the system stable and allowing incremental changes to easily transition from development to test or production environments.

MLOps has many individual components. 3 key workstreams will be covered in this article: data engineering, model engineering, and operations.

Data Engineering

Data engineering encompasses data analysis, extraction, collection, cleansing, preparation, and storage. Our data modernization article addresses many of the data modernization considerations, tools, and best practices, which are all vital inputs to model accuracy. Additional considerations specific to ML may include versioned or historical data sets, holdouts for training and testing, and transformation of data into model features. AWS has a number of services that accelerate data engineering. Amazon SageMaker Data Wrangler is an easy-to-use managed service that allows data engineers or scientists to prepare and feature engineer data specifically for ML needs. Amazon SageMaker Studio has gained features for more complex data processing, and Amazon SageMaker Feature Store provides capabilities for both online inference and offline training.

Organizations should plan for the reality that a significant portion of the time and energy required to successfully adopt ML will be invested in data engineering. Establishing and maintaining data quality are core capabilities to maximize value from ML initiatives.

Model Engineering

Custom model engineering and fine-tuning remain the domain of specialists, but in recent years managed cloud services and approachable models have allowed for the growth of “citizen data scientists”. These individuals typically have business domain knowledge paired with technical skills that allow them to generate valuable insights.

AWS is built by builders, for builders, but there’s significant value in exploring the platform capabilities before diving into a custom model. These are just a few of the AI as a Service offerings available on AWS:

If you need to perform image processing, Amazon Rekognition has quite a few image-based capabilities. Going beyond basic image processing, Amazon Lookout for Vision can even help automate production lines and perform quality control.
Amazon Textract can extract text and data from documents, and Amazon Comprehend can analyze sentiment and document context.
Chatbots and virtual agents are the domain of Amazon Lex, while Amazon Transcribe turns speech into text, Amazon Polly complements it by turning text into speech.
Amazon Forecast can help predict business trends and Amazon Personalize can make individualized recommendations, while Amazon Fraud Detector looks out for suspicious behavior.

If you can advance your business quickly using a platform service, it’s a great accelerator and benefits from AWS’ work on general purpose as well as specific models at scale across many customers. As your ML needs grow, you can always revisit and implement a specialized model if you require additional specificity.

For needs outside of the common services AWS has already built, Amazon SageMaker Autopilot or Amazon SageMaker Canvas can quickly automate custom model creation from input data and targets using automated ML (AutoML).

Operations

Once we have clean data in the right format and an effective model, operations takes the focus to host the model and expose it for integration. Monitoring for responsiveness can be as important as traditional API endpoints for many use cases. In addition, ML workloads add the complexity of drift or skew, where the model can become less predictive over time. MLOps addresses this by streamlining changes with CI/CD and ML Pipelines, monitoring for accuracy over time, and can even manage multiple competing “challenger” models to identify the most effective option.

Amazon SageMaker Pipelines simplify the implementation of CI/CD pipelines and includes model tracking. Amazon SageMaker AI has built-in, robust hosting capabilities that make operational resiliency and performance much easier to achieve, and Amazon SageMaker Model Monitor and Amazon SageMaker Clarify will facilitate monitoring for drift and bias respectively.

If this seems a lot to take in, it can indeed be hard to know where to start. Reach out to Caylent to plan your project and ask about our MLOps strategy assessment or our previous customer case studies like Upside to benefit from our experience facilitating the MLOps journey on AWS for our customers.

Analytical AI & MLOps

Data Modernization & Analytics

Mark Olson

Mark Olson, Caylent's Portfolio CTO, is passionate about helping clients transform and leverage AWS services to accelerate their objectives. He applies curiosity and a systems thinking mindset to find the optimal balance among technical and business requirements and constraints. His 20+ years of experience spans team leadership, technical sales, consulting, product development, cloud adoption, cloud native development, and enterprise-wide as well as line of business solution architecture and software development from Fortune 500s to startups. He recharges outdoors - you might find him and his wife climbing a rock, backpacking, hiking, or riding a bike up a road or down a mountain.

View Mark's articles

Learn more about the services mentioned

Caylent Services

Artificial Intelligence & MLOps

Apply artificial intelligence (AI) to your data to automate business processes and predict outcomes. Gain a competitive edge in your industry and make more informed decisions.

Caylent Services

Data Modernization & Analytics

From implementing data lakes and migrating off commercial databases to optimizing data flows between systems, turn your data into insights with AWS cloud native data services.