Machine Learning Operations on AWS

Machine Learning (ML) adoption is a strategic goal in most businesses today across industries, from born-in-the-cloud digital disruptors to traditional manufacturers, services, and everything in between. Making effective use of ML can be a significant competitive advantage, unlocking latent value in data assets from startups to enterprises. Because demand for data scientists is outpacing supply at the moment, a prudent approach to the adoption of ML can be to optimize with these scarce professionals’ productivity in mind.

MLOps promises to maximize data scientist productivity as well as model effectiveness and reliability, adding discipline to the development and deployment of ML models through repeatable processes and automation, and reducing repetitive and mistake-prone toil.

For those newer to ML, a good backgrounder is Machine learning, explained from MIT Sloan school’s website, which includes a succinct summary: “Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed”. This quote subtly encapsulates the premise of this article – if ML isn’t explicitly programmed like traditional algorithms, then the components of the system outside of the model itself must be major factors in the effectiveness of the insights to be derived.

In recent years, the MLOps concept has begun to do for ML what DevOps promises for applications – increase the rate of predictable change through automation and collaboration. Just as innovation in managed services for release automation and infrastructure provisioning on AWS have accelerated development team velocity, the Amazon SageMaker family and a growing library of AWS managed ML services are accelerating ML and MLOps adoption, some of which Caylent has written about previously when discussing data modernization.

ML implementations benefit from repeated experimentation and iteration. MLOps facilitates model testability and comparison by holding other variables in the system stable and allowing incremental changes to easily transition from development to test or production environments.

MLOps has many individual components. 3 key workstreams will be covered in this article: data engineering, model engineering, and operations.

Data Engineering

Data engineering encompasses data analysis, extraction, collection, cleansing, preparation, and storage. Our data modernization article addresses many of the data modernization considerations, tools, and best practices, which are all vital inputs to model accuracy. Additional considerations specific to ML may include versioned or historical data sets, holdouts for training and testing, and transformation of data into model features. AWS has a number of services that accelerate data engineering. Amazon SageMaker Data Wrangler is an easy-to-use managed service that allows data engineers or scientists to prepare and feature engineer data specifically for ML needs. Amazon SageMaker Studio has gained features for more complex data processing, and Amazon SageMaker Feature Store provides capabilities for both online inference and offline training

Organizations should plan for the reality that a significant portion of the time and energy required to successfully adopt ML will be invested in data engineering. Establishing and maintaining data quality are core capabilities to maximize value from ML initiatives.

Model Engineering

Custom model engineering and fine-tuning remain the domain of specialists, but in recent years managed cloud services and approachable models have allowed for the growth of “citizen data scientists”. These individuals typically have business domain knowledge paired with technical skills that allow them to generate valuable insights.

AWS is built by builders, for builders, but there’s significant value in exploring the platform capabilities before diving into a custom model. These are just a few of the AI as a Service offerings available on AWS:

If you can advance your business quickly using a platform service, it’s a great accelerator and benefits from AWS’ work on general purpose as well as specific models at scale across many customers. As your ML needs grow, you can always revisit and implement a specialized model if you require additional specificity.

For needs outside of the common services AWS has already built, Amazon SageMaker Autopilot or Amazon SageMaker Canvas can quickly automate custom model creation from input data and targets using automated ML (AutoML).


Once we have clean data in the right format and an effective model, operations takes the focus to host the model and expose it for integration. Monitoring for responsiveness can be as important as traditional API endpoints for many use cases. In addition, ML workloads add the complexity of drift or skew, where the model can become less predictive over time. MLOps addresses this by streamlining changes with CI/CD and ML Pipelines, monitoring for accuracy over time, and can even manage multiple competing “challenger” models to identify the most effective option.

Amazon SageMaker Pipelines simplify the implementation of CI/CD pipelines and includes model tracking. Amazon SageMaker has built-in, robust hosting capabilities that make operational resiliency and performance much easier to achieve, and Amazon SageMaker Model Monitor and Amazon SageMaker Clarify will facilitate monitoring for drift and bias respectively.

If this seems a lot to take in, it can indeed be hard to know where to start. Reach out to Caylent to plan your project and ask about our MLOps strategy assessment or our previous customer case studies like Upside to benefit from our experience facilitating the MLOps journey on AWS for our customers.

Share this article

Leave a comment


Share this article


Join Thousands of DevOps & Cloud Professionals. Sign up for our newsletter for updated information, insight and promotion.