Spotlight on CockroachDB

May 6, 2021

Data Modernization & Analytics

Databases

The construction, process and usage of databases has evolved a lot over the last few decades. Traditional relational databases were enough to work with the...

The construction, process and usage of databases has evolved a lot over the last few decades. Traditional relational databases were enough to work with the data present at that time, but with the innate reliance on the Internet, the progression of cloud-native architecture, and the advancement of how businesses utilise and analyse data science, relational databases are not cutting it. What happens if a node fails in a traditional single machine of a relational database? Your database would go down along with any applications that depend on it.

Over time as NoSQL databases were introduced—which are capable of handling a large amount of data in real-time—the risk of apps failing began to decrease but the risk of data inconsistencies increased. So, there has been a growing need for a better storage solution for data to cope with today’s dynamic cloud-native architecture. CockroachDB was specifically designed to solve and meet this need.

What Is CockroachDB?

CockroachDB is a globally distributed SQL database constructed on top of a transactional and consistent key-value store that you can use everywhere. The database tool is optimized for the cloud to deliver guaranteed transactions for local and globally distributed workloads and it allows you to build global, scalable and resilient cloud services. When using CockroachDB, you will typically run into two main terms: nodes and clusters. Nodes are individual machines running CockroachDB, and when we join these nodes together, they form a cluster which is the start of an entire functioning CockroachDB system. It’s advisable to run CockroachDB as a multi-node cluster to leverage the full scope of it’s cloud-native database design.

CockroachDB Architecture

CockroachDB is implemented as a distributed key-value store over a monolithic sorted map, to make it easy for large tables and indexes to function. While CockroachDB is a distributed SQL database, developers treat it as a relational database because it uses the same SQL syntax. But on an architecture level, CockroachDB’s architecture is different from a relational database architecture. In CockroachDB, every table is ordered lexicographically by key. So, when we store the data on the database, we are leveraging the key value store.

Since CockroachDB has a distributed architecture, we just need to spin a node up of cockroach Database, point it at a cluster, and the database participates in that cluster. CockroachDB then coordinates with the nodes to gain consensus for all queries and transactions. When we spin up a node and point at the cluster, data is balanced out based on what you optimally want to do with that data. The whole cluster has just one type of node as a single composable unit and every node is a single consistent gateway to the entirety of the database. So, we could have a database with clusters and nodes worldwide, which will look like one logical database to the application that is accessing it in whichever region.

CockroachDB architecture offers high availability and consistency. In the CockroachDB system, if a node dies, your application continues running by leveraging the other nodes in the cluster, and when you bring the node back online, the node reads are immediately consistent with the other nodes.

Synchronizing CockroachDB and Kubernetes

If you have spent a lot of time around people in the container or the orchestration community, you may have heard the opinion on occasion that running databases on Kubernetes is a bad idea. While some of these complaints were valid a couple of years ago, today there are multiple resilient databases that are capable of hiding virtualizations, failovers, container workloads, etc. Which is what makes CockroachDB is a great fit for Kubernetes clusters.

While Kubernetes makes it easy to deploy, scale and manage applications, managing and retaining the state of a cluster is a challenge on the orchestration platform. Kubernetes needs a storage system that can replicate data across the database nodes to survive any kind of failure, and that is what CockroachDB thrives at. Combining CockroachDB with Kubernetes allows us to orchestrate containers without sacrificing high availability and helps in maintaining the correctness of stateful databases.

Deploying CockroachDB on a Kubernetes Cluster

The ideal prerequisite for following the below steps is to have a Kubernetes cluster up and running for deploying and using CockroachDB on.

There are multiple ways how we can deploy CockroachDB on Kubernetes, for this article though, we will deploy through the CockroachDB Kubernetes operator. Firstly, apply the CockroachDB Operator using CustomResourceDefinition (CRD).

mylab@mylab:~$ kubectl apply -f 
https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml
customresourcedefinition.apiextensions.k8s.io/crdbclusters.crdb.cockroachlabs.com created

Next, apply the operator manifest. This will create all the roles, accounts, deployments necessary.

mylab@mylab:~$ kubectl apply -f 
https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/manifests/operator.yaml
clusterrole.rbac.authorization.k8s.io/cockroach-database-role created
serviceaccount/cockroach-database-sa created
clusterrolebinding.rbac.authorization.k8s.io/cockroach-database-rolebinding created
role.rbac.authorization.k8s.io/cockroach-operator-role created
clusterrolebinding.rbac.authorization.k8s.io/cockroach-operator-rolebinding created
clusterrole.rbac.authorization.k8s.io/cockroach-operator-role created
serviceaccount/cockroach-operator-sa created
rolebinding.rbac.authorization.k8s.io/cockroach-operator-default created
deployment.apps/cockroach-operator created

We can now check if the cockroach operator pod is running.

mylab@mylab:~$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
cockroach-operator-75787667ff-qf2xq   1/1     Running   0          109s

Download the example.yaml file which has the specifications for configuring Kubernetes cluster using operators.

mylab@mylab:~$ curl -O https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/examples/example.yaml

Apply the example manifest.

mylab@mylab:~$ kubectl apply -f example.yaml
crdbcluster.crdb.cockroachlabs.com/cockroachdb created

If we check the pods list, we can see 3 pod instances of CockroachDB are running, this is to provide high availability.

mylab@mylab:~$ kubectl get pods
NAME                                  READY   STATUS      RESTARTS   AGE
cockroach-operator-75787667ff-qf2xq   1/1     Running     1          24m
cockroachdb-0                         1/1     Running     0          5m28s
cockroachdb-1                         1/1     Running     0          5m7s
cockroachdb-2                         1/1     Running     0          4m48s

Using a built-in SQL client, get inside one of the CockroachDB pod instance SQL shell.

mylab@mylab:~$ kubectl exec -it cockroachdb-0 -- ./cockroach sql --certs-dir cockroach-certs
#
# Welcome to the CockroachDB SQL shell.
root@:75262/defaultdb>

Now, create a user for CockroachDB.

root@:75262/defaultdb> CREATE USER demo WITH PASSWORD 'demo123';
CREATE ROLE

Open a new terminal and post-forward the CockroachDB service on port 8080.

mylab@mylab:~$ kubectl port-forward service/cockroachdb-public 8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Open the browser and go to localhost:8080. We will be able to access the CockroachDB user interface. We need to put the user and password, which we created in the steps above, to login in to CockroachDB.

Once you login, you will get all the details of CockroachDB running in the pods on a Kubernetes cluster.

Now, let us go back to the terminal where we were running the database and execute few basic SQL commands to create a table.

root@:75262/defaultdb> show databases;
  database_name | owner
----------------+--------
  defaultdb     | root
  postgres      | root
  system        | node
(3 rows)

root@:75262/defaultdb> create database company;
CREATE DATABASE

root@:75262/defaultdb> use company;
SET
root@:75262/company> CREATE TABLE Employee ( EmployeeID int, Name varchar(30), City varchar(50) );
CREATE TABLE

root@:75262/company> INSERT INTO Employee VALUES (1, 'Rob', 'California');
INSERT 1

root@:75262/company> INSERT INTO Employee VALUES (2, 'Geoff', 'New York');
INSERT 1

root@:75262/company> SELECT * FROM Employee;
  employeeid |  name  |    city
-------------+--------+-------------
           1 | Rob   | California
           2 | Geoff | New York
(2 rows)

root@:75262/company> GRANT admin TO demo;
GRANT

Go back to the browser, refresh the page and click on the database tab. We can see the employee table appearing, which we created in CockroachDB.

And there you have it, even if you accidentally delete any CockroachDB instance running on the cluster, a new instance will start immediately.

mylab@mylab:~$ kubectl delete pod cockroachdb-0
pod "cockroachdb-0" deleted

We can see a new container is getting created.

mylab@mylab:~$ kubectl get pods
NAME                                  READY   STATUS              RESTARTS   AGE
cockroach-operator-75787667ff-qf2xq   1/1     Running             1          45m
cockroachdb-0                         0/1     ContainerCreating   0          11s
cockroachdb-1                         1/1     Running             0          25m
cockroachdb-2                         1/1     Running             0          25m
cockroachdb-vcheck-26982384-2bstc     0/1     Completed           0          26m

Here, all the three CockroachDB instances and back online.

mylab@mylab:~$ kubectl get pods
NAME                                  READY   STATUS      RESTARTS   AGE
cockroach-operator-75787667ff-qf2xq   1/1     Running     1          46m
cockroachdb-0                         1/1     Running     0          61s
cockroachdb-1                         1/1     Running     0          26m
cockroachdb-2                         1/1     Running     0          26m
cockroachdb-vcheck-26982384-2bstc     0/1     Completed   0          27m

Conclusion

CockroachDB aims to make leveraging business data easy. Rather than waste time and energy troubleshooting database shortcomings, refocus that time, investment and engineering into optimizing your company to become stronger on the market. Go ahead and give this distributed database a try.

Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.

Data Modernization & Analytics

Databases

Caylent Team

View Caylent's articles

Learn more about the services mentioned

Caylent Services

Data Modernization & Analytics

From implementing data lakes and migrating off commercial databases to optimizing data flows between systems, turn your data into insights with AWS cloud native data services.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Compare Amazon Kinesis Data Streams on-demand vs. provisioned pricing. See at what utilization levels provisioned mode saves money based on payload size and throughput.

Data Modernization & Analytics

April 5, 2023

Building a Simple AWS Data Warehouse Solution with Data Streaming

Build a serverless data warehouse on AWS by streaming Amazon DynamoDB data to Amazon S3 with AWS Lambda. A cost-effective architecture for historical analytics and business reporting.

Data Modernization & Analytics

Managed Services

IoT

February 9, 2023

Exploring the Depths of Amazon Kinesis Data Streams - Part 2: Scaling

Scale Amazon Kinesis Data Streams with low-cardinality partition keys using UpdateShardCount, manual shard splitting, and explicit hash values to balance traffic across shards.

Data Modernization & Analytics

Managed Services

IoT

View all blog posts

What Is CockroachDB?

CockroachDB Architecture

Synchronizing CockroachDB and Kubernetes

Deploying CockroachDB on a Kubernetes Cluster

Conclusion

Caylent Team

Learn more about the services mentioned

Data Modernization & Analytics

Accelerate your cloud native journey

Related Blog Posts

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Building a Simple AWS Data Warehouse Solution with Data Streaming

Exploring the Depths of Amazon Kinesis Data Streams - Part 2: Scaling

Learn more about the services mentioned

Data Modernization & Analytics

Caylent Team