re:Invent 2024

Working with Persistent Volumes in Kubernetes

Migrations

The main reason behind containerization is to allow microservices to run in a stateless way. A container will receive provisioned cloud resources, perform...

The main reason behind containerization is to allow microservices to run in a stateless way. A container will receive provisioned cloud resources, perform its tasks, and then be destroyed as soon as the process is over. There are no traces of that container or tied up cloud resources to worry about. This was what has made containerization so popular in the first place.

Running microservices as stateless instances, however, is not always as easy as it seems. As more applications get refactored and more microservices rely on containers for efficiency, sticking with the stateless concept becomes harder and harder. Stateless containers don’t always have the ability to meet complex requirements.

Here’s the simple truth: truly stateless applications, those that require no data to be stored over a long period of time, are unicorns; they are incredibly difficult to find in the wild, if not impossible. This is where persistent volumes, or stateful storage, comes in handy. It bridges the gap between ideal containerization and the requirements of apps and services.

Kubernetes Persistent Volume

Before we go further into how persistent volumes can be utilized, we need to take a closer look at persistent volume in Kubernetes. Kubernetes has always managed its storage resources in a peculiar way. It provisions, configures, and attaches storage blocks using a specific process or primitive, and they must be executed for the volumes to be usable.

Provisioning is the simplest part of the equation. This is the part where Persistent Volumes are created. You have the option to provision volumes statically or dynamically⁠—we will get to this in a bit. Configuration of volumes is handled as Storage Class. Storage Class contains details on the volumes they are associated with.

To complete the process, the volumes need to be attached to pods. Persistent Volume Claims are issued by pods whenever they need to use the storage blocks. A Persistent Volume Claim details the amount of storage required as well as other requirements based on the pods’ operations. Volumes can be attached and detached without being destroyed.

Persistent Volume is slightly different from Ephemeral Volume in one way: the latter exists only for as long as the pods exist. Unlike Persistent Volume, Ephemeral Volume is created during the pod creation process, and gets destroyed when the pod is destroyed. It is handy for storing temporary data or for supporting certain operations such as data visualization.

When to Use Persistent Volumes

From the previous explanation, it is clear that Persistent Volume is best used when you need to store data in a, well, persistent way. A common use case is when you have pods running database frameworks such as MySQL, and you need the databases to outlive pods in the cluster. Regardless of what happens to the pods, you will still have your database stored safely in persistent volumes.

This makes updating database pods easier. When you want to switch to one version of MariaDB to another, for instance, you can just run the update through your usual pipeline and not worry about losing crucial data. The same is true for operations like database backups since you have a single source to turn to every time.

When a persistent volume is mounted inside a pod, it acts as a native storage block and can be used for operations such as storing raw data or temporary processing outputs. You can choose to store raw data persistently if you want other pods to have access to it as well. More developers are using persistent volumes to do the same.

A cluster can also provision a dynamic Persistent Volume, especially when the claim coming from pods doesn’t match PVs already provisioned by the administrator. Pods are still limited to Storage Classes that are already defined by the administrator, but everything else about provisioning new PVs can be automated.

This feature needs to be enabled manually. First, you need to enable the DefaultStorageClass admission controller. Simply add DefaultStorageClass as one of the values for –enable-admission-plugins, and you are all set. For creating a static PV, the standard kubectl create –f your-pv.yaml command would suffice.

Creating and Using a Persistent Volume

Creating a persistent volume is easy. Create a .yaml file with kind: PersistentVolume and a suitable host path, and you can use kubectl -f your-pv.yaml apply to provision the resources. You can also use kubectl get pv to see a list of all Persistent Volumes that have been spooled up. A Persistent Volume Claim is just as easy to create, only this time you use kind: PersistentVolumeClaim and identify the PV name.

Of course, Persistent Volume Claim requires the PV and the suitable Storage Class to be found. When Kubernetes finds a suitable match, it will bind the PV to the pod issuing the claim.

On an added note, both the Persistent Volume and the Persistent Volume Claim needs to be deleted manually when you destroy a pod. If you no longer need the data stored in your PVs, use kubectl delete pvc pvc-name and kubectl delete pv pv-name to destroy the respective resources. 

Optimization Tips

Persistent Volumes are incredibly handy, especially for applications that need to store a lot of data. While it is easy to provision PVs dynamically, manually creating and deleting volumes allow you to have better control over the environment in general. Just don’t forget to delete unused PVs so you don’t end up paying for storage blocks you don’t need.

It is also a good idea to benchmark the provider of your storage blocks. Use SSD-powered servers with good read and write performance to maximize the overall performance of your applications; don’t let storage be the bottleneck. Kubernetes also works with a wide range of storage services (storageClasses), including AWS Elastic Block Store and Portworx.

With persistent storage, your pods can remain flexible enough for maximum agility while your data is stored over a long period of time. Persistent Volume⁠—and how it is supported by Kubernetes⁠—does offer the best of both worlds after all.

Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.

Migrations

Learn more about the services mentioned

Caylent Services

AWS Foundations & Migrations

From rehosting to replatforming to rearchitecting, Caylent will help you leverage AWS to its fullest potential to meet your business objectives.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Related Blog Posts

Programmatic Image Conversion to WebP Using Amazon S3, CloudFront, and Lambda

Learn how to optimize website performance by converting images to WebP format using Amazon CloudFront and S3, improving load times and user experience.

Migrations

Moving from VMware to Amazon EC2

Learn how to migrate from VMware to Amazon EC2 and avoid VMware licensing and cost uncertainties while unlocking transformative cloud scalability and efficiency.

Migrations
Infrastructure & DevOps Modernization
Cost Optimization

Best Practices for Migrating to Aurora MySQL

Aurora MySQL is a high-performance, fully managed database with Amazon RDS benefits, simplifying infrastructure for business focus. Learn migration best practices and essential components for a successful journey toward Aurora MySQL that can lead to increased scalability, resiliency, and cost-effectiveness.

Data Modernization & Analytics
Migrations