Skip to content
Dev Dump

Normalization

Normalization is not just about making tables smaller; it’s about ensuring every piece of data is stored exactly where it belongs logically.

Normalization Overview Figure 1: The progression of Normal Forms. Each higher level of normalization includes all the requirements of the levels below it.

Without normalization, databases suffer from Anomalies:

  • Insertion Anomaly: Unable to add data because some other unrelated data is missing (e.g., can’t add a new course until a student enrolls).
  • Update Anomaly: Updating a single value requires changing multiple rows, leading to potential inconsistencies.
  • Deletion Anomaly: Deleting a record unintentionally removes unrelated but vital information (e.g., deleting the last student in a course also deletes the course details).

A relation is in 1NF if it contains only atomic (indivisible) values. This means no cell should contain a list, array, or multiple pieces of data.

  • Rule 1: Each column must contain atomic values.
  • Rule 2: A column should contain values of the same type.
  • Rule 3: Each column should have a unique name.

1NF Transformation Figure 2: Transforming a non-atomic table into 1NF by flattening multi-valued cells into individual rows.


A relation is in 2NF if it is in 1NF and has no partial functional dependencies.

Partial Dependency: When a non-prime attribute (a column not part of the primary key) depends on only a portion of a composite primary key.

  • Condition: If the Primary Key is a single column, the table is automatically in 2NF (provided it’s in 1NF).

2NF Transformation Figure 3: Resolving partial dependency by splitting the table so that every non-key attribute depends on the ENTIRE primary key.


A relation is in 3NF if it is in 2NF and has no transitive dependencies.

Transitive Dependency: When a non-prime attribute depends on another non-prime attribute, rather than directly on the primary key. (A → B and B → C, therefore A → C).

3NF Transformation Figure 4: Removing transitive dependencies by moving the intermediate non-key attributes into their own dedicated table.


BCNF is a stricter version of 3NF (often called 3.5NF). It addresses cases where a table has multiple overlapping candidate keys.

  • The Rule: For every functional dependency X -> Y, X must be a Superkey.
  • In simpler terms: The determinant must always be a candidate key.

Consider a university system where:

  • Students enroll in subjects
  • Each subject is taught by exactly one professor
  • A professor can teach multiple subjects
Student_IDSubjectProfessor
S1MathProf. Smith
S2MathProf. Smith
S1PhysicsProf. Jones
S2PhysicsProf. Jones

Analysis:

  • Candidate Key: (Student_ID, Subject) — uniquely identifies each enrollment
  • Functional Dependency: Subject → Professor (each subject has one professor)

The Problem: Subject determines Professor, but Subject alone is not a superkey. This violates BCNF even though the table satisfies 3NF.

Solution: Decompose into two tables:

Enrollment Table:

Student_IDSubject
S1Math
S2Math
S1Physics

Teaching Table:

SubjectProfessor
MathProf. Smith
PhysicsProf. Jones

Now in the Teaching table, Subject → Professor is valid because Subject is the primary key (a superkey).

BCNF Transformation Figure 5: BCNF decomposition ensures every determinant in a functional dependency is a superkey.


A relation is in 4NF if it is in BCNF and has no multi-valued dependencies.

Multi-valued Dependency (MVD): Occurs when one attribute determines multiple independent values of other attributes. The notation A →→ B means “A multi-determines B.”

Consider a table tracking teachers’ subjects and hobbies:

TeacherSubjectHobby
Prof. SmithMathChess
Prof. SmithMathReading
Prof. SmithPhysicsChess
Prof. SmithPhysicsReading
Prof. JonesChemistryHiking
Prof. JonesChemistryPainting

The Problem:

  • A teacher’s subjects are independent of their hobbies
  • Storing them together creates a Cartesian product (2 subjects × 2 hobbies = 4 rows)
  • Adding one new hobby requires adding N new rows (where N = number of subjects)

MVD Notation: Teacher →→ Subject | Hobby

This reads as: “Teacher multi-determines Subject and Hobby independently.”

Solution: Decompose into two independent tables:

TeacherSubject Table:

TeacherSubject
Prof. SmithMath
Prof. SmithPhysics
Prof. JonesChemistry

TeacherHobby Table:

TeacherHobby
Prof. SmithChess
Prof. SmithReading
Prof. JonesHiking
Prof. JonesPainting

Benefits:

  • Adding a new hobby = 1 row (not N rows)
  • No redundant combinations
  • Each independent fact is stored exactly once

4NF Transformation Figure 6: 4NF eliminates multi-valued dependencies by separating independent facts into their own tables.


Also known as Project-Join Normal Form (PJNF). A table is in 5NF if it cannot be decomposed into any number of smaller tables without losing data or creating “spurious” (fake) records when joined back together.

  • It deals with Join Dependencies.
  • In practice, 5NF is rarely required in standard business applications unless dealing with extremely complex ternary relationships.

Consider a manufacturing scenario with a ternary relationship:

SupplierPartProject
S1P1J1
S1P2J1
S2P1J1
S1P1J2
S2P1J2

This table records: “Supplier S supplies Part P to Project J”

Business Rule (Cyclic Constraint):
If a supplier supplies a part, that part is used in a project, AND the supplier works on that project, then the supplier supplies that part to that project.

5NF decomposition is valid when the ternary relationship can be inferred from three binary relationships:

SupplierPart: (which suppliers supply which parts)

SupplierPart
S1P1
S1P2
S2P1

PartProject: (which parts are used in which projects)

PartProject
P1J1
P2J1
P1J2

SupplierProject: (which suppliers work on which projects)

SupplierProject
S1J1
S2J1
S1J2
S2J2

Example of Spurious Data:

If we incorrectly decompose and join, we might get:

  • (S2, P2, J1) — But S2 never supplied P2!

This happens when the business rule does not support inference from binary facts.

When 5NF Decomposition is SAFE:

IF Supplier supplies Part
AND Part is used in Project
AND Supplier works on Project
THEN Supplier supplies Part to Project

Only decompose to 5NF when this cyclic implication holds true for your business domain.

5NF Transformation Figure 7: 5NF handles join dependencies in complex ternary relationships. Decomposition is only valid if the join reconstructs the original data without spurious tuples.


Normal FormKey RequirementEliminates…
1NFAtomic valuesRepeating groups & multi-valued attributes
2NFNo partial dependencyRedundancy from composite keys
3NFNo transitive dependencyRedundancy from non-key dependencies
BCNFEvery determinant is a superkeyOverlapping candidate key anomalies
4NFNo multi-valued dependencyMulti-valued redundancy
5NFNo join dependencyRedundancy from complex join relationships