DBMS Fundamentals

DBMS Architecture

A DBMS provides query parsing, optimization, storage management, concurrency control, and recovery—all abstracted from the application layer.

DBMS Architecture Overview

Key responsibilities: Query optimization, buffer management, lock management, WAL/recovery, access control.

Data Models

Model	Structure	Examples	Trade-offs
Relational	Tables + SQL	PostgreSQL, MySQL, Oracle, SQL Server	ACID guarantees, mature tooling, vertical scaling limits
Document	JSON/BSON docs	MongoDB, CouchDB, Firestore	Schema flexibility, denormalized reads, weaker consistency
Key-Value	Hash map	Redis, DynamoDB, etcd, Memcached	O(1) lookups, no range queries, limited querying
Wide-Column	Column families	Cassandra, HBase, ScyllaDB	Write-optimized, time-series friendly, complex data modeling
Graph	Nodes + Edges	Neo4j, Amazon Neptune, ArangoDB	Relationship traversal, poor for non-graph workloads

ER Modeling

ER diagrams translate business requirements into schema design. Key mappings:

ER Concept	RDBMS Implementation
Entity	Table
Attribute	Column
Relationship	FK or junction table

Cardinality Patterns

Type	Implementation	Watch out for
1:1	FK with `UNIQUE` on either side	Often indicates table can be merged
1:N	FK on the “many” side	Index the FK column
M:N	Junction table with composite PK	Can become a bottleneck; consider denormalization

Attribute Types

Derived: Compute at read-time or materialize with triggers/generated columns
Multivalued: Normalize into separate table or use array types (Postgres [])
Composite: Flatten into columns or use structured types (JSONB)

Data Integrity

Constraint	Enforcement	Operational Impact
`PRIMARY KEY`	Unique + NOT NULL	Clustered index in most engines
`FOREIGN KEY`	Referential checks	Write overhead; consider `DEFERRABLE` for bulk loads
`CHECK`	Domain validation	Executed per-row; keep simple
`UNIQUE`	Prevents duplicates	Creates index; NULLs typically allowed
Triggers	Custom logic	Hidden complexity; prefer app-layer when possible

Keys

Key Type	Definition	Usage Notes
Primary	Unique row identifier	Immutable, indexed, one per table
Candidate	Columns that could be PK	Minimal set guaranteeing uniqueness
Composite	Multi-column PK	Common in junction tables
Foreign	Reference to another table’s PK	Index for join performance
Unique	Uniqueness without PK semantics	Allows NULLs (behavior varies by engine)

Surrogate vs. Natural Keys

Aspect	Surrogate (UUID/Auto-inc)	Natural (email, SKU)
Stability	Immutable	May change → cascading updates
Size	Fixed (8B int, 16B UUID)	Variable
Indexing	Predictable	Can be suboptimal
Debugging	Opaque	Human-readable

Recommendation: Use surrogates for PKs; enforce natural keys via UNIQUE constraints.

Transactions & ACID

Property	Guarantee	Implementation
Atomicity	All-or-nothing	WAL + rollback segments
Consistency	Constraints honored	Constraint checks at commit
Isolation	Concurrent txn separation	Locking or MVCC
Durability	Survives crashes	WAL fsync before commit

Isolation Levels (Weakest → Strongest)

Level	Phenomena Allowed	Use Case
Read Uncommitted	Dirty reads	Almost never appropriate
Read Committed	Non-repeatable reads	Default in Postgres, Oracle
Repeatable Read	Phantom reads	Default in MySQL InnoDB
Serializable	None	Financial transactions, inventory

Transaction Lifecycle States

Performance vs. Correctness

Higher isolation = more locking/aborts. Profile your workload. Many “Serializable” requirements can be relaxed with careful schema design or optimistic locking at the app layer.