Traditional Hashing

Traditional hashing, for example using modulo of the number of nodes, maps data to a fixed set of nodes. Adding or removing a node may cause significant redistribution of keys, as they will be hashed to a different node

Consistent Hashing

Consistent Hashing solves this problem by using the hash function that is independent of the number of nodes. It uses a huge constant hash space (ring) e.g. [0, 2^128 - 1] and the nodes and items both map to one of the slots on the ring

Instead of directly associating items with nodes, it uses a relative association: an item is associated to the node immediately to its right in the ring

Consistent Hashing on average requires only $k / n$ units of data to be migrated during scale up and down; where $k$ is the total number of keys and $n$ is the number of nodes in the system

Implementation

We are using two sorted arrays:

A nodes array holds the nodes
A keys array holds the positions of the nodes in the hash space

Each node nodes[i] is present at position keys[i] in the hash space. Both arrays are sorted based on the keys array to enable binary search lookup

Hash Function

We define total_slots as the size of entire hash space (ring), typically of the order 2^256 and the hash function could be implemented by taking SHA-256 modulo total_slots

Adding a Node

Adding a new node affects only the items hashed to the position directly to its left, which were previously associated with the node to its right

To add a new node:

Calculate the key (position in the hash space) for the node using the hash function
Find the index of the smallest key in keys that is greater than the computed key using binary search
Insert the computed key at the found index in keys
Insert the new node at the same index in nodes

Removing a Node

When a node is removed, only its associated items are affected, minimizing data migration and changes to the mapping

To remove a node:

Calculate the key (position in the hash space) for the node using the hash function
Find the index of the key in keys using binary search
Remove the computed key at the found index in keys
Remove the node at the same index in nodes

Associating an Item to a Node

Calculate the key (position in the hash space) for the item using the hash function
Return the index of the smallest key in keys that is greater than the computed key using binary search. If no such key exists, wrap around and return the 0th index

Virtual Nodes

TODO

🪴 Quartz 4.0

Explorer

Consistent Hashing

Traditional Hashing

Consistent Hashing

Implementation

Hash Function

Adding a Node

Removing a Node

Associating an Item to a Node

Virtual Nodes

References

Graph View

Table of Contents

Backlinks