Skip to content
Dev Dump

Database Partitioning and Sharding

Database partitioning is fundamental in distributed systems. It divides data into partitions that can be independently stored, managed, and queried.

  • Capacity: Fit data when a single machine’s storage/compute is not enough.
  • Performance: Restrict queries to relevant partitions; reduce IO.
  • Maintainability: Operate on subsets (backup, reindex) without impacting entire dataset.
  • Cost and Compliance: Keep hot vs. cold data on appropriate hardware.
ConceptScopeDeployment
PartitioningSplitting data within one logical database instanceSame server (logical splits)
ShardingDistributing partitions across multiple serversScale-out (multi-node)

Partitioning vs sharding

Rows are divided based on a partition key.

  • Great for scaling linearly with data volume.
  • Choose keys that evenly distribute load (avoid hotspot partitions).

Columns are split across tables, keeping the same primary key.

  • Useful when some columns are cold/large (BLOBs) or require extra security.
  • Increases join complexity when recombining column groups.
TechniqueHow it WorksProsCons
RangeSplit by continuous ranges (dates, IDs)Easy to reason aboutRisk of uneven load if ranges skew
ListExplicit values per partition (regions, categories)Intuitive mappingRebalancing when categories change
HashHash function on key distributes rowsEven distribution automaticallyHarder to control locality
CompositeCombine strategies (e.g., range + hash)Balances locality + distributionMore complex routing logic

Sharding places partitions on separate servers (shards). Each shard is a self-contained DB node storing a subset of data.

Sharding layout

  1. Shard key: attribute(s) determining shard placement (user_id, tenant_id).
  2. Router/metadata: service that maps keys to shard addresses.
  3. Shard replicas: optional replication within each shard for HA.
  • Shard A: user_id 1-25M -> db-shard-a.company.com
  • Shard B: user_id 25-50M -> db-shard-b.company.com
  • Router maps user_id to shard; applications query the correct node.
  • Cross-shard joins: expensive; often replaced by denormalization or application-side joins.
  • Referential integrity: foreign keys across shards are not enforced by the DB; ensure in code.
  • Rebalancing: adding shards requires moving data; plan for consistent hashing or directory services.
  • Hotspots: poor shard keys can overload a single shard; monitor and adjust.
  • Automate shard metadata updates; avoid manual routing tables.
  • Monitor per-shard metrics (CPU, disk, query latency) to catch imbalances.
  • Design schema with sharding in mind; keep related data co-located by the shard key.
  • Consider geo-partitioning to keep data near users and satisfy data residency laws.
  • Partitioning is logical table splits; sharding distributes those splits across nodes.
  • Horizontal partitioning supports scale-out; vertical partitioning optimizes column access and security.
  • Choose shard keys carefully, plan for rebalancing, enforce cross-shard rules in application logic.