Cloud storage in a compartmentalized environment is always a dark horse. It can be the simplest part of constructing a system, but it also has the potential to become very complex once your requirements grow. For data-intensive apps, storage and its scalability are key ingredients to success.
Just because you don’t run a storage-intensive app, however, it doesn’t mean you can take storage lightly. In a Kubernetes environment, storage can still be one of the most difficult things to navigate. But worry not because we are going to dig deep into Kubernetes storage in this article.
Forget About Drives
In the old days, adding storage to a server was a simple task. You added new physical hardware to the rack, mounted the new drive, and you were all set. In a corporate on-prem environment, local drives were (and still are in some cases) typically allocated for the OS only. For apps and databases though, the storage is often set from Storage Devices (Arrays) sharing logical drives to servers usually through fibre channels (Servers and Storage Arrays were connected with each other in a Storage Area Network SAN).
These Storage arrays are very complex and there were a lot of configurations around them, requiring specialized knowledge. From the server side, disks volumes are usually handled by volume manager software, requiring the server admin to properly configure them. In today’s cloud environment, storage is so much more straightforward in comparison. There is a lot of automation that encapsulates the complexity of what happens behind the scenes when an Admin attaches a virtual storage device to a virtual instance in the cloud.
As Kubernetes blurred the line between physical and cloud servers, storage has become something that needs to be seen from a different perspective. Disks are assigned physically but allocated logically. In traditional data centers, we also have virtual drives. There are several volume manager software in the market that creates logical volumes from a pool of physical drives. This enables efficiency in terms of space utilization and ,sometimes, better performance. Containers, however, are designed to be far more robust and flexible.
Storage is also used differently depending on the kind of solution it is a part of, but this only makes managing your storage use easier. In fact, you can run two apps on two different Kube clusters, each one provisioned by a different cloud provider and still use single database storage.
Along Came Volumes
Kubernetes lets you handle storage differently, starting with using Volumes. Volumes are created to allow information stored in containers to be maintained and data is preserved across Container restarts. However, when a pod ceases to exist, the volume will cease to exist, too.
Volumes in Kubernetes are very easy to manage. It is basically a directory that gets mounted to a pod. After telling the container to use Volumes for storing evergreen information, you can safely modify the pods without ever losing your data.
Kubernetes supports lot of volume types including awsElasticBlockStore, emptyDir, nfs, persistentVolumeClaim, configMap, and secret, to name a few of the ones you might want to take a closer look at. Check out the full list here. Almost all cloud providers offer Volumes or storage of some description. This plays to one of the real strengths of Kubernetes: its lack of hardware awareness.
Adding Persistent Volumes
Some volumes might need to outlive pods; this is not uncommon, especially with CI/CD being the process that many developers use to remain agile. This is where Persistent Volumes, or PVs, come in handy.
Persistent Volumes in Kubernetes doesn’t associate itself with one or a handful of pods. It is a resource block assigned by the administrator to the particular cluster—similar to nodes in the same cluster. Persistent Volumes acquire the details of the storage implementation whether it’s NFS, iSCSI, or a cloud-provider-specific storage system.
This means that Persistent Volumes can be used for different reasons too. Many developers use PVs for those services needing persistence, like an artifact repository. With fast storage and optimum caching, persistent storage can be the perfect solution for your needs.
Persistent Volumes are attached to nodes and pods claim a portion of this storage through PersistentVolumeClaims. Since Persistent Volumes also store data beyond the lifetime of pods, it can bridge gaps in CI/CD processes.
A Safety Layer
The final piece of the puzzle for Kubernetes storage is the Volume Snapshot. As the name suggests, Volume Snapshots are provisioned by users, but it is also designed to be compatible with dynamic provisioning if needed. The snapshot is made by Kubernetes literally taking a snapshot of the Persistent Volume in a cluster and storing it for recovery.
Don’t let the ‘snapshot’ jargon fool you; a Volume Snapshot can actually be very big. It eats into your cloud storage allocation rather quickly. In return, though, it offers something that is invaluable, an extra safety layer for your data.
Snapshots can be restored at any point. Volume Snapshots are generated so that you can freeze other volumes in a particular state. In the more recent releases of Kubernetes, it is even possible to automate the creation (and deletion) of snapshots depending on the state of PVs as well as other parameters of the system. It does so using CSI drivers.
No Rule to Follow
There is no definite rule to follow when setting up Kubernetes storage. Remember that volumes in a pod persist upon a pod restart. They are removed only when the pod is removed from the node. Kubernetes also has StorageClass for dynamic provisioning, which means you no longer have to provision storage in a linear (and fixed) way.
StorageClass is entirely interesting on its own too. Rather than assigning a Persistent Volume, the entire cluster can dynamically use storage in a scalable way. And dynamic provisioning means we can allocate a volume by making an API call without depending on an Administrator. With storage classes, we can define different types/classes of volumes (standard, best performance, etc.,) and the developer can use these different classes based on the app needs.
There are a lot of ways to achieve efficient Kubernetes Storage, it just all comes down to your needs and requirements. If you only need temporary storage, there is no need to create a separate AWS EBS just because you think you have to. On the other hand, storage-intensive systems need to be more serious about tackling storage-related challenges. It may seem daunting at first, but Kubernetes Storage is really not that complicated.
Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.