Caylent Catalysts™
IoT
Connect, understand, and act on data from industrial devices at scale to improve uptime, efficiency, and reliability across manufacturing, energy, and utilities.
Understand how Amazon Kinesis partition keys distribute records across shards. Learn why poor key selection creates traffic imbalances and how to design keys for even utilization.
This blog was originally written and published by Trek10, which is now part of Caylent.
At Trek10, we are heavy users of Amazon Kinesis Data Streams for building scalable, reliable, and cost-effective cloud native data architectures. In a blog post from a couple years ago, we discussed some high-level concepts of Kinesis Data Streams. That blog post describes several important concepts to keep in mind when working with Kinesis Data Streams. But to really build enterprise-scale solutions, you need to get much deeper. In this blog series, we will seek to expand upon that baseline knowledge, dive a bit deeper into how Kinesis works, and explore some more complicated Kinesis concepts. At a quick glance, some of the advanced topics we will cover over three parts include:
Before we jump into any of these, it makes sense to talk a bit further about what exactly Kinesis is and how it works. It helps to consider Kinesis Data Streams as actual storage. This storage has special properties such as:
Some liberty is taken with the concept of a timestamp, so there is no explicit timestamp value. The concept of a timestamp is handled behind the scenes for us by tracking sequence IDs. Every record within Kinesis has only a Sequence Id, Partition Key, and Data Blob. The Sequence ID is provided to the record by Kinesis. The partition key and data blob are provided by the application producer. Understanding that this is a storage of data and not a simple queue can help vastly in figuring out the differences between Kinesis and other services which get used in similar scenarios.
Although partition keys were discussed in the previous post, they are such a crucial part of what makes Kinesis work. Implementations that do not require your messages to be read in order can disregard the choice of partition keys as nothing more than random values. However, for implementations where the order is paramount to effective delivery, you must carefully consider the choice of key. Because of this importance, we will discuss them more with additional detail.
Each shard for a given stream is provided a hash range. The shard itself will contain all records whose hashed partition values fall within this hash range. To discover the hash ranges of your shards, you can perform the DescribeStream API call. This API call will return to you a payload including the maximum and minimum hash values for all the shards currently provisioned. When you provide a partition key value, that value is passed through a hash function. The resulting hash is then mapped to the correct shard.
This level of detail can help to understand why it is so important to heavily consider your choice in partition keys. Should you choose an uninformed choice of providing a single value for all records partition keys, then all records will always contain the identical hash value. This means there can never be a way for you to efficiently scale your workload and any choice in shard count or parallelization factor would be irrelevant. It also provides interesting possibilities for hot shards, described in the other post, to occur in unexpected ways. Let's use an example to showcase this possibility.
We will start with two shards and a simplified max hash range of [0,9]. That is, all records will be hashed to values between 0 and 9 inclusive. Should we choose a partition key with a discrete number of values, we can have an interesting problem occur. Our first shard will be given the values [0,4] and second shard the values of [5,9]. If our partition key for each record is one of the four following evenly distributed values; foo, bar, biz, and baz, then these values will have unique and consistent hash values. By definition, a well-defined hash function will evenly distribute values across its valid hash range. That means all numbers are equally likely to be the result of a given hash value regardless of the characteristics of your input value. Let’s go ahead and perform our hash function on these values and see where we end up:
At this point, the issue should start to come into focus. Three of the four values have been provided a hash value between the ranges of 0 and 4. This means that assuming all partition keys have identical velocities, the first shard will contain three times the traffic as the second shard.
The distribution of records between shards can be a crucial thing to understand for performance, but most importantly it helps to inform you about efficient scaling techniques. Join us in the next post to dive deeper still into ways in which we can scale our solution.
To continue reading this series, check out:
Founded in 2013, Trek10 helped organizations migrate to and maximize the value of AWS by designing, building, and supporting cloud-native workloads with deep technical expertise. In 2025, Trek10 joined Caylent, forming one of the most comprehensive AWS-only partners in the ecosystem, delivering end-to-end services across strategy, migration and modernization, product innovation, and managed services.
View Trek10's articlesCaylent Catalysts™
Connect, understand, and act on data from industrial devices at scale to improve uptime, efficiency, and reliability across manufacturing, energy, and utilities.
Caylent Services
Reliably Operate and Optimize Your AWS Environment
Caylent Services
Quickly establish an AWS presence that meets technical security framework guidance by establishing automated guardrails that ensure your environments remain compliant.
Learn how to return an HTTP response from AWS Lambda immediately using response streaming while continuing background execution — ideal for Slack integrations with tight timeouts.
Compare Amazon Kinesis Data Streams on-demand vs. provisioned pricing. See at what utilization levels provisioned mode saves money based on payload size and throughput.
Learn when Amazon EventBridge Pipes can replace simple AWS Lambda connector functions and when they fall short. Includes practical guidance on InputTemplates and data transformation.