Exploring the Depths of Amazon Kinesis Data Streams - Part 2: Scaling

February 9, 2023

Data Modernization & Analytics

Managed Services

IoT

Scale Amazon Kinesis Data Streams with low-cardinality partition keys using UpdateShardCount, manual shard splitting, and explicit hash values to balance traffic across shards.

This blog was originally written and published by Trek10, which is now part of Caylent.

In Part 1 of this deep dive, we discussed in great detail how shards work and the ways in which records are distributed. Building onto our hypothetical problem, we will use this post to explore opportunities in how we can further scale that solution. As a refresher, we had data at a velocity that requires scaling, however, the partition keys of choice have a relatively low cardinality. This scenario may require precise scaling considerations.

Scaling

The issue identified above does have a couple of solutions. You could simply increase the number of shards by performing the UpdateShardCount API call. This function has all the requirements outlined in the previous blog post and allows you to let AWS split or merge shards based on their own algorithms. AWS will attempt to produce shards whose hash ranges are as even as possible. So in our case, if we decide to double our number of shards by performing the update shard count function to 4, we may have shards with the ranges [0,1], [2,3,4], [5,6], [7,8,9]. With this division, we immediately notice that shard 1 will have double the record count of shard 2 and 3 while shard 4 will contain absolutely no records. The image below illustrates this:

Splitting Shards

Another way of handling this issue is to manually split and merge shards yourself. If you first merge your shards into 1 shard or perform the UpdateShardCount API down to 1 shard, you can then perform the SplitShard API call. This API call allows you to provide your own minimum hash value for the shard in question. This operation can take one shard and produce two new shards whose hash ranges are [0,n-1] and [n,9] where n is your choice of minimum hash value. Looking at our example above, if we joined back to a single shard with the hash range of [0,9], we can then manually split with a minimum hash value of 3. This will create two shards with the following values: [0,2] and [3,9]. Even though new values will not be evenly distributed between these shards, we do know that our keys will be evenly distributed. If we have a new value of bam, this would have a 30% chance to be mapped to shard 1 and a 70% chance to be mapped to shard 2.

Some interesting things to consider when performing this operation is that Kinesis does not retroactively fix previous shards. If you took the time to review the DescribeStream API call, you will notice two values for ParentShardId and AdjacentParentShardId. These values are how Kinesis hides from you the fact that records for some period of time may exist across multiple shards. Referring back to our example, if we have our initial shards with IDs 0 and 1, when we perform the merge, we create a new shard with id 2 whose parent shard is 0 and adjacent parent shard id is 1. Once we perform the split, we create two more shards with id 3 and 4. Their parent shard ids are both 2 and there is no adjacent parent shard id.

If you now take time to read through some of the restrictions and considerations for scaling from the API documents for UpdateShardCount, they should hopefully begin to make some sense.

Although we don’t have the ability to alter the underlying hash function which AWS uses for kinesis, an alternative solution is to provide our own hash value as an explicit hash. When posting a record to Kinesis, you have the opportunity to override the generated hash with an explicit hash value. The primary concern with this approach is that you have to be able to consistently maintain that hash value indefinitely and make certain any changes are a well-documented event. Altering the manual hash value can result in records being published to a different shard and resulting in loss of order over a short period of time.

These are not the only options to further scale your solution. Kinesis has a fairly large feature set and other advanced features to help in further optimization. Join us in our next blog post in this series as we dive once more into some unique features such as Parallelization Factor and Enhanced Fan Out.

To continue reading this series, check out:

Exploring the Depths of Kinesis Data Streams - Part 3: Advanced Features

Data Modernization & Analytics

Managed Services

IoT

Trek10 Team

Founded in 2013, Trek10 helped organizations migrate to and maximize the value of AWS by designing, building, and supporting cloud-native workloads with deep technical expertise. In 2025, Trek10 joined Caylent, forming one of the most comprehensive AWS-only partners in the ecosystem, delivering end-to-end services across strategy, migration and modernization, product innovation, and managed services.

View Trek10's articles

Learn more about the services mentioned

Caylent Catalysts™

IoT

Connect, understand, and act on data from industrial devices at scale to improve uptime, efficiency, and reliability across manufacturing, energy, and utilities.

Caylent Services

Managed Services

Reliably Operate and Optimize Your AWS Environment

Caylent Services

Infrastructure & DevOps Modernization

Quickly establish an AWS presence that meets technical security framework guidance by establishing automated guardrails that ensure your environments remain compliant.

Accelerate your cloud native journey

Leveraging our deep AWS expertise

Get in touch

AWS Lambda Functions: Return Response and Continue Executing

Learn how to return an HTTP response from AWS Lambda immediately using response streaming while continuing background execution — ideal for Slack integrations with tight timeouts.

Managed Services

September 21, 2023

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

Compare Amazon Kinesis Data Streams on-demand vs. provisioned pricing. See at what utilization levels provisioned mode saves money based on payload size and throughput.

Data Modernization & Analytics

August 28, 2023

How and When to Use Amazon EventBridge Pipes

Learn when Amazon EventBridge Pipes can replace simple AWS Lambda connector functions and when they fall short. Includes practical guidance on InputTemplates and data transformation.

Managed Services

IoT

View all blog posts

Scaling

Splitting Shards

Trek10 Team

Learn more about the services mentioned

IoT

Managed Services

Infrastructure & DevOps Modernization

Accelerate your cloud native journey

Related Blog Posts

AWS Lambda Functions: Return Response and Continue Executing

Amazon Kinesis Data Streams On-demand vs. Provisioned Billing Mode Cost Comparison

How and When to Use Amazon EventBridge Pipes

Learn more about the services mentioned

IoT

Managed Services

Infrastructure & DevOps Modernization