Run the World

Read how Run the World stabilized their platform and deployed a new CI/CD process to enable their product team to focus on feature releases instead of infrastructure.

"With the support from Caylent, Run The World’s service stability is now significantly increased. We never experienced any unintentional down time and thanks to Caylent there are many crucial monitoring and alerts systems built in place to mitigate risk. With the new CI/CD deployment process, the product team is able to spend their time on product releases. All this wraps up to mean we had a great time collaborating with Caylent."

– Xuan Jiang, CTO

Company Introduction

Run The World is an online event platform designed for organizers, attendees, speakers, and sponsors worldwide. Run The World helps people build virtual events through simple plug and play templates⁠ enabling them to create an engaging online event for their attendees.

Objective

Run the World supports professional organizations to organize more frequent meetings and engage more members, with little financial cost and setup time needed. They offer exciting formats and proprietary technology that enables attendees to interact with speakers and socialize with each other in a fun and engaging way.

Challenge

Run The World was referred to Caylent by Andreessen Horowitz (a16z), who led their recent $10.8 million Series A round in the California-based company. The company was experiencing a severe production outage in their Kubernetes environment hosted on AWS. Their rapid event growth and increased scale of online attendees in their virtual conferences—due in part to the global COVID-19 pandemic⁠—caused the full destabilization of the company’s production environment.

Solution

Run The World contacted us on a Friday morning in March in the midst of a production outage. Within 3 hours of the phone call, the Caylent DevOps team was triaging and remediating the situation—caused in part by a misconfigured service mesh and huge amounts of inbound traffic across several of the company’s hosted events. Many pots of coffee later, Run The World’s production environment was re-deployed and back online that same day.

Immediately after the production environment was stabilized through Caylent’s SWAT team approach, our team flipped into solutions architecture and began planning a huge platform upgrade for the company. The new roadmap created by Caylent called for a migration of the company’s self-hosted Kubernetes cluster, initially deployed with KOPS, to Amazon Elastic Kubernetes Service (EKS) deployed with Terraform to manage their Infrastructure-as-Code.

Adopting EKS provided Run The World with highly available, reliable and resilient Kubernetes clusters that leverage AWS EC2 spot instances. This meant we were also able to implement an AWS Savings Plan for considerable cost optimization without sacrificing scalability or availability. The team also carried out a number of GraphQL tweaks to decrease latency and noise and maximize performance tuning. For further performance improvements, the team configured and deployed Elasticsearch, Prometheus, and Jaeger operators to streamline monitoring and alerting (with Slack integration and notifications as well as SSL certs) including:

Everything proposed in the roadmap was designed to protect the company from future outages and downtime by improving the resilience of the infrastructure and introducing scalability to ensure a smooth end user experience during large online events.

As part of the roadmap, the Caylent team built and deployed brand new EKS Production, Staging, and Development environments using Terraform to deploy the Infrastructure-as-Code. Using Terraform, the team implemented IAM Roles for Service Accounts (IRSA) to allow pods running in the EKS cluster automatically assume an IAM role when using other AWS resources. We created custom Helm Charts to improve Kubernetes deployments and performed service load testing using K6s to prepare the system for large events and traffic volumes. Additionally, we migrated the company’s database to Amazon Aurora so the DB cluster can handle future scaling.

The volume of work covered in a short period of time on Run The World’s behalf also included:

Results

As well as providing swift and immediate tactical DevOps SWAT team remediation of their production outage, over the next several months the team at Caylent undertook and successfully executed a transformative platform re-architecture. The successful realization of which helped stabilize the company’s environments, provide significant ROI on the engagement, and allowed Run The World to expand its user base and host increasingly large online events with tens of thousands of simultaneous attendees. Prior to the partnership with Caylent in 2020 this would have been unachievable.

 

Client

Run the World

Industry

SaaS

Location

Mountain View, California

Share This :
Share on facebook
Share on twitter
Share on linkedin
WE'RE HERE TO HELP​

Ready to Accelerate Your Native Cloud Journey

With two distinct delivery models, Caylent is able to meet you where you are in your cloud journey and deliver whether a tightly-scoped project and budget or if you require ongoing support to drive your vision forward.

CONTACT US

Plan Your Project

Where will the cloud take you? Let's find out together.