re:Invent 2024

Scaling ML to Meet Customer Demand and Reduce Errors

Analytical AI & MLOps

Learn how we helped a technology company scale their Machine Learning (ML) platform.

A technology company specializing in email signature parsing and contact data extraction recently engaged with Caylent to solve ML scaling challenges. Their software helps businesses automatically capture and organize contact information from email signatures, streamlining the process of managing customer and prospect data. By leveraging advanced parsing algorithms, the company enables businesses to efficiently extract valuable contact details from email communications, reducing manual data entry and improving the accuracy of customer relationship management systems.

The company recently experienced rapid growth, and had challenges scaling their machine learning operations (MLOps) to meet increasing demand. Being a small/medium-sized business (SMB), they encountered difficulties in efficiently training and deploying ML models while maintaining high performance standards. The company needed a solution that would allow them to implement a scalable ML platform, automate model training and deployment processes, improve model versioning and performance tracking, and reduce manual efforts in their ML workflows.

The company's existing infrastructure was struggling to keep up with the growing volume of data and the need for more frequent model updates. They required a robust system that could handle their expanding needs while remaining cost-effective and manageable for their small team.

Solution

To address the company's needs, a comprehensive MLOps solution was designed and implemented using AWS services and open-source tools. The solution focused on creating a scalable, reproducible, and automated workflow for machine learning operations.

The tech stack included Kubeflow for MLOps orchestration, Amazon SageMaker for ML workloads such as training and deployment, AWS CDK for Infrastructure as Code, GitHub for code sourcing, and GitHub Actions for CI/CD pipelines. Additional AWS services like Lambda, SNS, and S3 were utilized for storage, supporting pipelines, and triggers.

The solution implemented an end-to-end MLOps pipeline consisting of three main components:

  1. A training pipeline that automates the process of model training and evaluation using SageMaker.
  2. A model deployment pipeline that handles the deployment of approved models to production endpoints.
  3. A labeling pipeline with functionality to trigger retraining based on new datasets, incorporating a feedback loop for continuous improvement.

Key features of the solution included automated CI/CD for data and models, SageMaker Model Registry for version tracking and comparison, real-time machine learning releases, and an automated retraining process with minimal manual intervention.

The design prioritized scalability and reproducibility, ensuring that the company could easily manage multiple models and expand their ML capabilities as their business grew. By leveraging AWS services like SageMaker, the solution optimized infrastructure costs while maintaining high performance.

Results

The implementation of the new ML platform significantly improved the company's machine learning operations across several key areas:

  1. Model Performance: The automated training and evaluation processes led to more frequent model updates and improvements. This resulted in more accurate email signature parsing and contact data extraction, enhancing the overall quality of the company's core service.
  2. Scalability: The company gained the ability to handle a larger volume of data and train multiple models simultaneously. This newfound scalability supported their growth trajectory and allowed them to process an increasing number of emails and signatures without compromising performance.
  3. Reduced Manual Effort: The automated pipelines for training, deployment, and labeling significantly reduced the need for manual intervention in the ML workflow. This allowed the company's small team to focus on higher-value tasks such as feature development and customer support.
  4. Enhanced Version Control: The use of SageMaker Model Registry enabled better tracking and comparison of model versions. This led to more informed decision-making about which models to deploy to production, ensuring that only the best-performing models were used.
  5. Faster Time-to-Market: Real-time machine learning releases enabled the company to quickly deploy new model improvements to production. This agility allowed them to respond more rapidly to changes in email signature formats or new data extraction requirements.
  6. Cost Optimization: By leveraging AWS services like SageMaker, the company optimized their infrastructure costs while maintaining high performance. The pay-as-you-go model of cloud services allowed them to scale resources up or down based on demand, avoiding over-provisioning.
  7. Improved Data Quality: The automated labeling pipeline helped in continuously improving the quality of training data, leading to better model accuracy over time. This feedback loop ensured that the system became more robust in handling various email signature formats and edge cases.

These improvements positioned the company to better serve their growing customer base and maintain their competitive edge in the email signature parsing and contact data extraction market. The new MLOps platform provided a solid foundation for future growth and innovation, allowing the company to focus on expanding their product offerings and entering new markets with confidence in their ML infrastructure.

At Caylent, we understand the complexities of scaling machine learning operations and the transformative impact they can have on your business. Our suite of Caylent Catalysts, including our MLOps with SageMaker and MLOps Strategy, are designed to help organizations like yours streamline and enhance their ML workflows. Whether you're dealing with rapid growth, seeking improved model performance, or aiming to reduce manual efforts, our team is equipped to provide customized solutions that meet your unique needs. Contact us today to discover how we can help you transform your machine learning operations.

Analytical AI & MLOps
Brian Tarbox

Brian Tarbox

Brian is an AWS Community Hero, Alexa Champion, runs the Boston AWS User Group, has ten US patents and a bunch of certifications. He's also part of the New Voices mentorship program where Heros teach traditionally underrepresented engineers how to give presentations. He is a private pilot, a rescue scuba diver and got his Masters in Cognitive Psychology working with bottlenosed dolphins.

View Brian's articles

Learn more about the services mentioned

Caylent Catalysts™

MLOps Strategy

Plan and implement an MLOps strategy unique to your team's needs, capabilities, and current state, unlocking the next steps in tactical execution by offloading the infrastructure, data, operations, and automation work from data scientists.​

Caylent Catalysts™

ML Operations with SageMaker

Leverage Amazon SageMaker’s MLOps toolset to reduce your time-to-market, streamline administrative tasks, lower your operational costs, and free up valuable time for data scientists and engineers to focus on innovation and differentiation.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Related Blog Posts

Caylent Launches Applied Intelligence, an AI-Driven Model to Reduce Cloud Complexities and Accelerate Adoption

New methodologies, frameworks, and solutions for delivering the next generation of cloud services will cut migration and modernization timelines from years to months.

Analytical AI & MLOps

Transforming Chatbots into Multi-Use Business Tools with Generative AI

Multi-intent chatbots are revolutionizing business processes. Learn how you can leverage generative AI to solve complex organizational challenges and enhance operational efficiency with our step-by-step guide.

Analytical AI & MLOps

AI-Augmented OCR with Amazon Textract

Learn how organizations can eliminate manual data extraction with Amazon Textract, a cutting-edge tool that uses machine learning to extract and organize text and data from scanned documents.

Analytical AI & MLOps