Database containerization has emerged with various critiques here and there. Data insecurity, specific resource requirements, network problems are oft quoted as the significant drawbacks of the practice. Nevertheless, container usage has been on the increase, and so too has the method of containerizing databases.
Container usage is now being applied by organizations of all sizes from small startups to huge established microservices platforms. Even prominent database players like Google, Amazon, Oracle, and Microsoft have adopted containerization. This article aims to help beginners navigate the minefield of database containerization and avoid some of the major pitfalls that can occur. Note, we are not recommending it’s usage, but if you feel the need then hopefully this will help.
But what is database containerization?
Understanding Database Containerization
Database containerization encases databases within a container alongside its operating environment to enable data loading onto a virtual machine and run it independently.
Here are four factors that support the use of the database in containers.
1. Usage of the same configuration or ports for all containers
This setup eliminates some of the overheads that come with a distributed system which supports different nodes types. This distributed system brings about the need for the maintenance of separate containers which also requires multiple configurations. Database containerization supports one kind of configuration.
2. Resilience, resources, and storage
Containers aren’t meant to persist with data inside them. In traditional database scenarios, there is often the need for data replication or for data to be exported from a central storage system. Which makes this process expensive and also significantly slows performance.
Databases act like any other server-side app except they are typically more CPU and memory intensive, are highly stateful, and they utilize storage. All concepts that work the same in containers. On top of that, it’s possible to manage states, limit resources, and restrict network access.
3. Cluster upscale or downscale
The practice addresses the uncertainties of how successful an application will be and the volume it will require by improving the elasticity of its infrastructure. Database containerization accommodates application elasticity; growing when needed and also shrinking to useful infrastructure support. Adding more nodes to clusters can help rebalance data in the background.
4. Data locality and networking
Network scaling has been a significant challenge in modern virtualized data centers. Usually, load balancers take all traffic in the first run and then distributes to the application containers. The application containers then have to communicate to the databases thereby creating more traffic. Containerization brings the database and the application a little closer together alleviating some of the networking issues.
Efficiently Deploy Databases in Containers
Putting databases in containers comes with inherent obstacles to overcome. Databases manifest some fundamental properties that make it hard for them to be containerized effectively. These include their ability to handle persistent storage of data which is critical. The need for disk space to store large amounts of data, and the complex configuration layers required which create a limitation for database containerization. The practice also suffers from the need for high throughput and low latency networking capabilities.
If you are going to put your database in your container then it’s advisable to use the container orchestration platform Kubernetes. The StatefulSets feature of K8s was designed to overcome the very problems that occur when attempting to build and run database clusters inside containers.
If you really and truly have to go down this path, where possible try to use stable Helm Charts to help you get there. Please note though that this doesn’t mean the deployment will still be as stable as through a managed service, but a lot of the heavy lifting will be done for you. K8s will build, deploy, and label your containers concurrently and the self-healing element maintains your cluster health. Ensure the Chart you choose implements the database with StatefulSets and persistent volume claims (to store the actual data in the event of a failure).
- StatefulSets runs your containers and orchestrates everything together while making pods more suited to stateful applications.
- Storage is stable and persistent.
- StatefulSets pods each have a few unique attributes;
- They’re all labeled with an ordinal name which allows for stable identification.
- Pods are built one at a time instead of in one go, which is helpful when bootstrapping a stateful system.
- Pod rescheduling is stable and persistent.
- Pods can be shut down gracefully when you scale down the amount of replicas needed, very useful for databases!
- Mounts persistent storage volume to where your database saves its data.
- With StatefulSets, you can use a ‘’sidecar’’ container to help your main container do the necessary work.
- Just like with Kubernetes ReplicaSets, with StatefulSets you can scale nodes easily with
- Rolling updates are ordered and graceful.
There you have it. If you really must put your database in a container, then use Kubernetes StatefulSets to help you get there. Most importantly though, ask yourself why you’re doing it in the first place and if you really need to…?
Caylent is a cloud-native services company that helps organizations bring the best out of their people and technology using AWS. We are living in a software-defined world where technology is at the core of every business. To thrive in this paradigm, organizations need to empower their people and processes through technology. Caylent is uniquely positioned to fuel that engine of innovation by bringing ambitious ideas to life for our customers.
Caylent works with customers to build, scale and optimize sophisticated cloud solutions using deep subject matter expertise to deliver world-class outcomes through an agile co-delivery model.