Spotlight on CockroachDB

Data Modernization & Analytics

The construction, process and usage of databases has evolved a lot over the last few decades. Traditional relational databases were enough to work with the...

The construction, process and usage of databases has evolved a lot over the last few decades. Traditional relational databases were enough to work with the data present at that time, but with the innate reliance on the Internet, the progression of cloud-native architecture, and the advancement of how businesses utilise and analyse data science, relational databases are not cutting it. What happens if a node fails in a traditional single machine of a relational database? Your database would go down along with any applications that depend on it. 

Over time as NoSQL databases were introduced—which are capable of handling a large amount of data in real-time—the risk of apps failing began to decrease but the risk of data inconsistencies increased. So, there has been a growing need for a better storage solution for data to cope with today’s dynamic cloud-native architecture. CockroachDB was specifically designed to solve and meet this need. 

What Is CockroachDB?

CockroachDB is a globally distributed SQL database constructed on top of a transactional and consistent key-value store that you can use everywhere. The database tool is optimized for the cloud to deliver guaranteed transactions for local and globally distributed workloads and it allows you to build global, scalable and resilient cloud services. When using CockroachDB, you will typically run into two main terms: nodes and clusters. Nodes are individual machines running CockroachDB, and when we join these nodes together, they form a cluster which is the start of an entire functioning CockroachDB system. It’s advisable to run CockroachDB as a multi-node cluster to leverage the full scope of it’s cloud-native database design.

CockroachDB Architecture

CockroachDB is implemented as a distributed key-value store over a monolithic sorted map, to make it easy for large tables and indexes to function. While CockroachDB is a distributed SQL database, developers treat it as a relational database because it uses the same SQL syntax. But on an architecture level, CockroachDB’s architecture is different from a relational database architecture. In CockroachDB, every table is ordered lexicographically by key. So, when we store the data on the database, we are leveraging the key value store.

Since CockroachDB has a distributed architecture, we just need to spin a node up of cockroach Database, point it at a cluster, and the database participates in that cluster. CockroachDB then coordinates with the nodes to gain consensus for all queries and transactions. When we spin up a node and point at the cluster, data is balanced out based on what you optimally want to do with that data. The whole cluster has just one type of node as a single composable unit and every node is a single consistent gateway to the entirety of the database. So, we could have a database with clusters and nodes worldwide, which will look like one logical database to the application that is accessing it in whichever region.

CockroachDB architecture offers high availability and consistency. In the CockroachDB system, if a node dies, your application continues running by leveraging the other nodes in the cluster, and when you bring the node back online, the node reads are immediately consistent with the other nodes. 

Synchronizing CockroachDB and Kubernetes

If you have spent a lot of time around people in the container or the orchestration community, you may have heard the opinion on occasion that running databases on Kubernetes is a bad idea. While some of these complaints were valid a couple of years ago, today there are multiple resilient databases that are capable of hiding virtualizations, failovers, container workloads, etc. Which is what makes CockroachDB is a great fit for Kubernetes clusters.

While Kubernetes makes it easy to deploy, scale and manage applications, managing and retaining the state of a cluster is a challenge on the orchestration platform. Kubernetes needs a storage system that can replicate data across the database nodes to survive any kind of failure, and that is what CockroachDB thrives at. Combining CockroachDB with Kubernetes allows us to orchestrate containers without sacrificing high availability and helps in maintaining the correctness of stateful databases.

Deploying CockroachDB on a Kubernetes Cluster

The ideal prerequisite for following the below steps is to have a Kubernetes cluster up and running for deploying and using CockroachDB on.

There are multiple ways how we can deploy CockroachDB on Kubernetes, for this article though, we will deploy through the CockroachDB Kubernetes operator. Firstly, apply the CockroachDB Operator using CustomResourceDefinition (CRD).

Next, apply the operator manifest. This will create all the roles, accounts, deployments necessary. 


We can now check if the cockroach operator pod is running.


Download the example.yaml file which has the specifications for configuring Kubernetes cluster using operators.


Apply the example manifest.


If we check the pods list, we can see 3 pod instances of CockroachDB are running, this is to provide high availability. 


Using a built-in SQL client, get inside one of the CockroachDB pod instance SQL shell.


Now, create a user for CockroachDB.


Open a new terminal and post-forward the CockroachDB service on port 8080.


Open the browser and go to localhost:8080. We will be able to access the CockroachDB user interface. We need to put the user and password, which we created in the steps above, to login in to CockroachDB.

Once you login, you will get all the details of CockroachDB running in the pods on a Kubernetes cluster.

Now, let us go back to the terminal where we were running the database and execute few basic SQL commands to create a table. 


Go back to the browser, refresh the page and click on the database tab. We can see the employee table appearing, which we created in CockroachDB.

And there you have it, even if you accidentally delete any CockroachDB instance running on the cluster, a new instance will start immediately.


We can see a new container is getting created.


Here, all the three CockroachDB instances and back online.


Conclusion

CockroachDB aims to make leveraging business data easy. Rather than waste time and energy troubleshooting database shortcomings, refocus that time, investment and engineering into optimizing your company to become stronger on the market. Go ahead and give this distributed database a try.

Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.

Data Modernization & Analytics

Learn more about the services mentioned

Caylent Services

Data Modernization & Analytics

From implementing data lakes and migrating off commercial databases to optimizing data flows between systems, turn your data into insights with AWS cloud native data services.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Related Blog Posts

Best Practices for Migrating to Aurora MySQL

Aurora MySQL is a high-performance, fully managed database with Amazon RDS benefits, simplifying infrastructure for business focus. Learn migration best practices and essential components for a successful journey toward Aurora MySQL that can lead to increased scalability, resiliency, and cost-effectiveness.

Data Modernization & Analytics
Migrations

re:Invent 2023 Data Session Summaries

Get up to speed on all the data focused 300 and 400 level sessions from re:Invent 2023!

Cloud Technology
Data Modernization & Analytics

Amazon Bedrock vs SageMaker JumpStart

Learn about the differences in how Amazon Bedrock and SageMaker JumpStart help you approach foundation model training for GenerativeAI Use cases on AWS.

Artificial Intelligence & MLOps
Data Modernization & Analytics
Video