Skip to content

System Requirements & Foundations

Why This Page Matters

Before designing databases, caches, and queues, define the core system requirements:

What scale should the system handle?
How reliable and available should it be?
What latency is acceptable?
How should traffic be routed globally?

These choices drive architecture decisions later.

1. Scalability

Vertical scaling (scale up): add CPU/RAM to one machine.
Horizontal scaling (scale out): add more machines behind a load balancer.

Scaling approaches

Factor	Vertical	Horizontal
Setup complexity	Low	Medium/High
Max capacity	Hardware-limited	High
Single-point failure risk	Higher	Lower (with redundancy)
Cost at small scale	Usually lower	Usually higher

2. Availability, Reliability, and Fault Tolerance

Availability: percent of time system is up.
Reliability: system performs correctly over time.
Fault tolerance: system continues operating when components fail.

Useful terms:

SLO: internal target (for example, 99.9% monthly availability, p95 latency under 200ms).
SLA: external promise/contract, usually lower than SLO.
MTTR: mean time to recover after failure (lower is better).

Common availability targets:

SLA	Max downtime/year (approx.)
99%	3.65 days
99.9%	8.76 hours
99.99%	52.6 minutes
99.999%	5.26 minutes

Fault tolerance with failover

Reliability patterns:

Redundant instances across zones/regions.
Health checks + auto failover.
Data replication and backups.
Graceful degradation (core features first).

Practical reliability checklist:

Remove single points of failure at app, DB, and network layers.
Test failure paths regularly (zone loss, DB primary failover, cache outage).
Prefer fast recovery over perfect prevention.

3. Latency vs Throughput

Latency: time for one request/response.
Throughput: number of requests processed per second.

Latency and throughput under load

Both matter:

Low latency improves user experience.
High throughput supports traffic growth.

Measure latency with percentiles:

p50: typical user experience.
p95: slower tail users (common SLO metric).
p99: worst tail behavior under load.

How to improve both:

Caching (app cache + CDN).
Fewer network hops and optimized queries.
Async processing and batching.
Horizontal scaling.

Capacity relationship (simplified):

Higher latency reduces throughput for fixed worker count.
If traffic grows and utilization stays too high, tail latency rises sharply.

Performance budgeting helps:

Set per-hop budgets (API gateway, service, DB).
Track p95/p99 in production and alert on regressions.

4. DNS, CDN, and Proxies

DNS

DNS maps domain names to IP addresses.

Useful commands:

dig
nslookup

CDN

CDN = globally distributed cache for static content:

lower latency,
reduced origin load,
better spike handling.

Proxy types

Type	Represents	Typical use
Forward proxy	Client	Filtering, privacy, egress control
Reverse proxy	Server	Load balancing, TLS termination, caching

Request path with DNS, CDN, and reverse proxy