ML Workflow Block
Next, machine learning engineers convert the proposed solution by the data scientist to the production-ready ML code and create end-to-end machine learning workflow including data processing, feature engineering, training, model evaluation, and model creation for deployment using a variety of available hosting options. In AWS there are 4 options for orchestrating end-to-end ML workflow with Amazon SageMaker integration:
Amazon SageMaker Pipeline
- Using Pipelines SDK a series of interconnected steps will build the entire ML pipeline that is defined using a directed acyclic graph (DAG).
Amazon Managed Workflow for Apache Airflow (MWAA)
- Using Airflow SageMaker operators or using Airflow PythonOperator end-to-end ML pipeline can be configured, scheduled, and monitored.
AWS Step Function workflow
- Integration between AWS SageMaker and AWS Step Functions through the AWS Step Function Data Science SDK allows one to easily create multi-step machine learning workflows.
Kubeflow Orchestration
- Amazon SageMaker components for Kubeflow pipelines allows you to submit SageMaker processing, training, HPO jobs, and deploy the model directly from the Kubeflow pipeline workflow.
As part of the model evaluation/test step, AWS Step Function is launched to run a comprehensive suite of ML-related tests. Additionally, Amazon SageMaker Feature Store is used to store, share, and manage features for machine learning (ML) models during training (offline storage) and inference (online storage). Finally, SageMaker ML Lineage tracking is enabled to track data, and model lineage metadata which is crucial for ML workflow reproducibility, model governance, and audit standards.