Skip to content
Dev Dump

Database Partitioning & Sharding

While often used interchangeably, they represent different architectural scopes.

The Library Analogy:

  • Partitioning: Organizing one large bookshelf into sections (History, Science, Fiction). It’s easier to find books, but it’s still one bookshelf.
  • Sharding: Splitting the books across three different library buildings. Now you have three times the space and three teams of librarians working simultaneously.

Partitioning vs Sharding Figure 1: Partitioning stays within one machine (logical split), while sharding spreads data across multiple nodes (physical scale-out).

FeaturePartitioningSharding
Physical NodesSingle server nodeMultiple distributed nodes
GoalManageability & Local PerfMassive scale & Throughput
ComplexityLow to ModerateHigh (requires routing & coordination)
Fault ToleranceLimited by single nodeHigh (one shard failing doesn’t kill all)

There are two fundamental ways to slice your data: Horizontal and Vertical.

Horizontal vs Vertical Partitioning Figure 2: Horizontal partitioning splits by rows; Vertical partitioning splits by columns.

Rows of a table are split into multiple partitions based on a Partition Key.

  • Pros: Scales linearly with the number of records.
  • Example: Putting users with ID 1–1000 in Partition A and 1001–2000 in Partition B.

The table is split by columns. Some columns go to one table, others to another, both sharing the same primary key.

  • Pros: Reduces I/O for tables with many columns; increases security (PII in separate tables).
  • Example: Splitting a User table into UserBasic (Name, Email) and UserBlobs (Profile Picture, Bio).

How do we decide which row goes where?

  1. Range Partitioning: Maps data based on continuous ranges (e.g., Dates: Jan, Feb, Mar).
    • Risk: Can lead to “Hotspots” if most activity happens in the latest month.
  2. List Partitioning: Maps data to explicit values (e.g., Country: ‘US’, ‘IN’, ‘UK’).
    • Use Case: Great for data residency compliance.
  3. Hash Partitioning: Applies a hash function to the key (e.g., hash(user_id) % 4) to determine the partition.
    • Pros: Excellent uniform distribution of data.
  4. Composite Partitioning: Combines techniques (e.g., Range by Year, then Hash by UserID within each year).

When moving to a sharded environment, you need a way to find the data.

Sharding Architecture Overview Figure 3: A typical sharding setup includes an application layer, a routing layer (proxy), and the individual shards.

The most critical decision in sharding. It is the attribute used to determine where data is stored.

  • Good Shard Key: High cardinality (many unique values) and uniform distribution.
  • Bad Shard Key: Causes all traffic to hit one shard (e.g., a Status field with only ‘Active/Inactive’).

Applications typically connect to a Router or Proxy (like Vitess for MySQL or Citus for Postgres). The router evaluates the Shard Key in the query and forwards the request to the correct physical node.


Traditional hash sharding (hash(key) % N) has a massive flaw: if you add a new node (N + 1), almost all keys will map to different locations, requiring a total data migration.

Consistent Hashing solves this by mapping both nodes and data onto a circular “Hash Ring.”

Consistent Hashing Ring Figure 4: In consistent hashing, data is mapped to the first node encountered clockwise on the ring. Adding a node only requires moving a small fraction of data.


Sharding provides “infinite” scale, but it comes at a steep price:

  • Cross-Shard Joins: Joining tables across different physical servers is extremely slow and often unsupported. Developers must perform joins in the application code.
  • Distributed Transactions: Ensuring ACID properties across shards requires complex protocols like Two-Phase Commit (2PC), which can hurt performance.
  • Resharding: As data grows, you eventually need to split an existing shard. This process is complex and can cause downtime if not handled by a sophisticated control plane.
  • Referential Integrity: Most sharded databases cannot enforce Foreign Key constraints across shards. This logic must move to the application layer.
  • Partitioning is logical; Sharding is physical.
  • Horizontal splits rows (scale volume); Vertical splits columns (scale access pattern).
  • Shard Keys must be chosen with extreme care to avoid hotspots.
  • Use Consistent Hashing to minimize data movement when scaling the cluster.
  • Sharding is a last resort—optimize your queries, indexes, and caching first!