Normalization
Normalization is not just about making tables smaller; it’s about ensuring every piece of data is stored exactly where it belongs logically.
Figure 1: The progression of Normal Forms. Each higher level of normalization includes all the requirements of the levels below it.
Why Normalize? (The Anomalies)
Section titled “Why Normalize? (The Anomalies)”Without normalization, databases suffer from Anomalies:
- Insertion Anomaly: Unable to add data because some other unrelated data is missing (e.g., can’t add a new course until a student enrolls).
- Update Anomaly: Updating a single value requires changing multiple rows, leading to potential inconsistencies.
- Deletion Anomaly: Deleting a record unintentionally removes unrelated but vital information (e.g., deleting the last student in a course also deletes the course details).
1. First Normal Form (1NF)
Section titled “1. First Normal Form (1NF)”A relation is in 1NF if it contains only atomic (indivisible) values. This means no cell should contain a list, array, or multiple pieces of data.
- Rule 1: Each column must contain atomic values.
- Rule 2: A column should contain values of the same type.
- Rule 3: Each column should have a unique name.
Figure 2: Transforming a non-atomic table into 1NF by flattening multi-valued cells into individual rows.
2. Second Normal Form (2NF)
Section titled “2. Second Normal Form (2NF)”A relation is in 2NF if it is in 1NF and has no partial functional dependencies.
Partial Dependency: When a non-prime attribute (a column not part of the primary key) depends on only a portion of a composite primary key.
- Condition: If the Primary Key is a single column, the table is automatically in 2NF (provided it’s in 1NF).
Figure 3: Resolving partial dependency by splitting the table so that every non-key attribute depends on the ENTIRE primary key.
3. Third Normal Form (3NF)
Section titled “3. Third Normal Form (3NF)”A relation is in 3NF if it is in 2NF and has no transitive dependencies.
Transitive Dependency: When a non-prime attribute depends on another non-prime attribute, rather than directly on the primary key. (A → B and B → C, therefore A → C).
Figure 4: Removing transitive dependencies by moving the intermediate non-key attributes into their own dedicated table.
4. Boyce-Codd Normal Form (BCNF)
Section titled “4. Boyce-Codd Normal Form (BCNF)”BCNF is a stricter version of 3NF (often called 3.5NF). It addresses cases where a table has multiple overlapping candidate keys.
- The Rule: For every functional dependency
X -> Y, X must be a Superkey. - In simpler terms: The determinant must always be a candidate key.
Example: Student-Subject-Professor
Section titled “Example: Student-Subject-Professor”Consider a university system where:
- Students enroll in subjects
- Each subject is taught by exactly one professor
- A professor can teach multiple subjects
| Student_ID | Subject | Professor |
|---|---|---|
| S1 | Math | Prof. Smith |
| S2 | Math | Prof. Smith |
| S1 | Physics | Prof. Jones |
| S2 | Physics | Prof. Jones |
Analysis:
- Candidate Key:
(Student_ID, Subject)— uniquely identifies each enrollment - Functional Dependency:
Subject → Professor(each subject has one professor)
The Problem: Subject determines Professor, but Subject alone is not a superkey. This violates BCNF even though the table satisfies 3NF.
Solution: Decompose into two tables:
Enrollment Table:
| Student_ID | Subject |
|---|---|
| S1 | Math |
| S2 | Math |
| S1 | Physics |
Teaching Table:
| Subject | Professor |
|---|---|
| Math | Prof. Smith |
| Physics | Prof. Jones |
Now in the Teaching table, Subject → Professor is valid because Subject is the primary key (a superkey).
Figure 5: BCNF decomposition ensures every determinant in a functional dependency is a superkey.
5. Fourth Normal Form (4NF)
Section titled “5. Fourth Normal Form (4NF)”A relation is in 4NF if it is in BCNF and has no multi-valued dependencies.
Multi-valued Dependency (MVD): Occurs when one attribute determines multiple independent values of other attributes. The notation
A →→ Bmeans “A multi-determines B.”
Example: Teacher Skills and Hobbies
Section titled “Example: Teacher Skills and Hobbies”Consider a table tracking teachers’ subjects and hobbies:
| Teacher | Subject | Hobby |
|---|---|---|
| Prof. Smith | Math | Chess |
| Prof. Smith | Math | Reading |
| Prof. Smith | Physics | Chess |
| Prof. Smith | Physics | Reading |
| Prof. Jones | Chemistry | Hiking |
| Prof. Jones | Chemistry | Painting |
The Problem:
- A teacher’s subjects are independent of their hobbies
- Storing them together creates a Cartesian product (2 subjects × 2 hobbies = 4 rows)
- Adding one new hobby requires adding N new rows (where N = number of subjects)
MVD Notation: Teacher →→ Subject | Hobby
This reads as: “Teacher multi-determines Subject and Hobby independently.”
Solution: Decompose into two independent tables:
TeacherSubject Table:
| Teacher | Subject |
|---|---|
| Prof. Smith | Math |
| Prof. Smith | Physics |
| Prof. Jones | Chemistry |
TeacherHobby Table:
| Teacher | Hobby |
|---|---|
| Prof. Smith | Chess |
| Prof. Smith | Reading |
| Prof. Jones | Hiking |
| Prof. Jones | Painting |
Benefits:
- Adding a new hobby = 1 row (not N rows)
- No redundant combinations
- Each independent fact is stored exactly once
Figure 6: 4NF eliminates multi-valued dependencies by separating independent facts into their own tables.
6. Fifth Normal Form (5NF)
Section titled “6. Fifth Normal Form (5NF)”Also known as Project-Join Normal Form (PJNF). A table is in 5NF if it cannot be decomposed into any number of smaller tables without losing data or creating “spurious” (fake) records when joined back together.
- It deals with Join Dependencies.
- In practice, 5NF is rarely required in standard business applications unless dealing with extremely complex ternary relationships.
Example: Supplier-Part-Project
Section titled “Example: Supplier-Part-Project”Consider a manufacturing scenario with a ternary relationship:
| Supplier | Part | Project |
|---|---|---|
| S1 | P1 | J1 |
| S1 | P2 | J1 |
| S2 | P1 | J1 |
| S1 | P1 | J2 |
| S2 | P1 | J2 |
This table records: “Supplier S supplies Part P to Project J”
Business Rule (Cyclic Constraint):
If a supplier supplies a part, that part is used in a project, AND the supplier works on that project, then the supplier supplies that part to that project.
When Can We Decompose to 5NF?
Section titled “When Can We Decompose to 5NF?”5NF decomposition is valid when the ternary relationship can be inferred from three binary relationships:
SupplierPart: (which suppliers supply which parts)
| Supplier | Part |
|---|---|
| S1 | P1 |
| S1 | P2 |
| S2 | P1 |
PartProject: (which parts are used in which projects)
| Part | Project |
|---|---|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
SupplierProject: (which suppliers work on which projects)
| Supplier | Project |
|---|---|
| S1 | J1 |
| S2 | J1 |
| S1 | J2 |
| S2 | J2 |
The Spurious Tuple Problem
Section titled “The Spurious Tuple Problem”Example of Spurious Data:
If we incorrectly decompose and join, we might get:
(S2, P2, J1)— But S2 never supplied P2!
This happens when the business rule does not support inference from binary facts.
When 5NF Decomposition is SAFE:
IF Supplier supplies Part AND Part is used in Project AND Supplier works on ProjectTHEN Supplier supplies Part to ProjectOnly decompose to 5NF when this cyclic implication holds true for your business domain.
Figure 7: 5NF handles join dependencies in complex ternary relationships. Decomposition is only valid if the join reconstructs the original data without spurious tuples.
Summary of Normal Forms
Section titled “Summary of Normal Forms”| Normal Form | Key Requirement | Eliminates… |
|---|---|---|
| 1NF | Atomic values | Repeating groups & multi-valued attributes |
| 2NF | No partial dependency | Redundancy from composite keys |
| 3NF | No transitive dependency | Redundancy from non-key dependencies |
| BCNF | Every determinant is a superkey | Overlapping candidate key anomalies |
| 4NF | No multi-valued dependency | Multi-valued redundancy |
| 5NF | No join dependency | Redundancy from complex join relationships |