CAP Theorem
CAP Theorem: In a distributed data store, you can only simultaneously provide two out of three guarantees: Consistency, Availability, and Partition Tolerance. Because network partitions (P) are unavoidable in distributed systems, the real choice is always between C and A.
1. The Core Guarantees
Section titled “1. The Core Guarantees”| Guarantee | Meaning in CAP | The Quick Check |
|---|---|---|
| Consistency (C) | Every read receives the most recent write (or an error). | Did the user get fresh data? |
| Availability (A) | Every request receives a non-error response, without guarantee that it contains the most recent write. | Did the user get a response? |
| Partition Tolerance (P) | The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network. | Can the system survive a network cut? |
2. The Partition Decision
Section titled “2. The Partition Decision”The CAP Theorem is essentially a statement about what happens when the network fails.
If node A cannot communicate with node B, the system has a network partition. You must make a choice:
- Choose Consistency (CP): Cancel the operation on the isolated node (decrease availability) so it doesn’t return stale data.
- Choose Availability (AP): Proceed with the operation on the isolated node, returning stale data (decrease consistency) but keeping the system alive.
During a partition, you are forced to choose what to sacrifice.
3. CP vs AP Systems
Section titled “3. CP vs AP Systems”How do systems behave during an actual partition?
CP Systems (Consistency + Partition Tolerance)
Section titled “CP Systems (Consistency + Partition Tolerance)”- When to use: When atomic reads/writes are critical, and reading stale data is a fatal error.
- Behavior during cut: If a node is cut off from the primary/quorum, it will return an
Error 503 Service Unavailableor timeout to protect the data’s integrity. - Examples: Banking systems, inventory locks, consensus databases (etcd, ZooKeeper, HBase).
AP Systems (Availability + Partition Tolerance)
Section titled “AP Systems (Availability + Partition Tolerance)”- When to use: When the system must remain operational despite network failures, and eventual consistency is acceptable.
- Behavior during cut: If a node is isolated, it will still serve read/write requests based on the stale data it currently holds. Once the partition heals, the nodes sync up.
- Examples: Social media feeds, product reviews, caching layers, DynamoDB, Cassandra.
4. Why “CA” is Not a Distributed Option
Section titled “4. Why “CA” is Not a Distributed Option”A “CA” system (Consistent and Available) implies a system that never drops a network packet.
This only exists in:
- A single-node database (like a standalone Postgres server). If the node goes down, it’s not partitioned, it’s just dead.
- A magically perfect network.
Interview Tip: Never propose a “CA” architecture for a multi-node distributed system. You must assume partitions (P) will happen. Your job is to explain why you chose CP or AP for specific features.
5. The Consistency Spectrum
Section titled “5. The Consistency Spectrum”In practice, consistency is not binary (Strong vs None). Systems offer a spectrum of consistency guarantees tailored to specific use cases.
| Level | Guarantee | Trade-offs | Typical Use Case |
|---|---|---|---|
| Strong | A read always returns the latest written value. | High latency, requires locks/quorum. | Financial ledgers, passwords. |
| Eventual | A read might return stale data, but all nodes will eventually converge to the same value. | High availability, lower latency. Requires conflict resolution. | Social feeds, shopping carts. |
| Weak | No guarantee that reads will see the latest writes. Data loss is acceptable. | Ultra-low latency, maximum throughput. | VoIP, real-time gaming telemetry. |
Designing with the Spectrum
Section titled “Designing with the Spectrum”Modern microservice architectures mix these.
- User Authentication: CP (Strong Consistency). You cannot log in with an old password.
- Product Catalog: AP (Eventual Consistency). It’s okay if a price update takes 5 seconds to propagate to the edge cache.
- Video Streaming: AP/Weak. Dropping a frame of video to maintain playback speed is better than buffering.