A simplified explanation of normal forms.
by Dale R. Rollins
Fourth Normal Form (4NF)
4NF is concerned with data redundancy that arises when facts about unrelated many-to-many relationships are recorded in the same table. 4NF prohibits a table from representing more than one many-to-many relationship unless those relationships are interdependent.
The following example involves two many-to-many relationships:
| 1. |
An employee can fill many roles, and
many employees can fill the same role. |
| 2. |
An employee can be skilled in many languages, and
many employees can be skilled in the same language. |
The Employee Capability table shows both of these abilities for each employee. However, it is not in 4NF because (for the purpose of this example) language skills have no bearing on role abilities.
Employee
Emp Nbr |
Emp Name |
72956 |
Sally Smith |
39479 |
Bob Burns |
|
Role
Role Nbr |
Role Name |
1 |
Analyst |
2 |
Researcher |
3 |
Accountant |
|
Language
Language Id |
Language Name |
Eng |
English |
Fre |
French |
Ger |
German |
|
Employee Capability
Emp Nbr |
Role Nbr |
Language Id |
72956 |
1 |
Eng |
72956 |
1 |
Ger |
72956 |
2 |
Eng |
72956 |
2 |
Ger |
39479 |
1 |
Fre |
39479 |
1 |
Eng |
39479 |
3 |
Fre |
39479 |
3 |
Eng |
|
Sally Smith can fill two roles and is skilled in two languages. However, each of those facts is recorded twice because unrelated facts are stored in the same table. If Sally acquires a third language, that fact will have to be recorded in two new rows, one for each of her roles. If she later becomes able to fill a third role, that fact will have to be recorded in three new rows, one for each of her languages.
The table can be put into 4NF by splitting it into two tables, one for each of the many-to-many relationships. The Employee Role table shows which roles each employee can fill. The Employee Language table shows which languages each employee is skilled in. Now recording a new role or language for an employee involves adding only one new row to the appropriate table.
Employee Role
Emp Nbr |
Role Nbr |
72956 |
1 |
72956 |
2 |
39479 |
1 |
39479 |
3 |
|
Employee Language
Emp Nbr |
Language Id |
72956 |
Eng |
72956 |
Ger |
39479 |
Fre |
39479 |
Eng |
|
A table with only two identifying parents (i.e., one whose primary key includes foreign keys from only two parent tables) represents a single many-to-many relationship, and so is always in 4NF. Only a table with three or more identifying parents can violate 4NF. These are usually a small minority of the tables in a database.
However, a table can have three identifying parents (or more) and still be in 4NF. Consider the following, somewhat similar example. Here the Employee Language Skill table shows which skills an employee has in each language.
Employee
Emp Nbr |
Emp Name |
72956 |
Sally Smith |
39479 |
Bob Burns |
|
Language Skill
Skill Nbr |
Skill Name |
1 |
Reading |
2 |
Writing |
3 |
Speaking |
|
Language
Language Id |
Language Name |
Eng |
English |
Fre |
French |
Ger |
German |
|
Employee Language Skill
Emp Nbr |
Skill Nbr |
Language Id |
72956 |
1 |
Eng |
72956 |
2 |
Eng |
72956 |
3 |
Eng |
72956 |
1 |
Ger |
39479 |
3 |
Fre |
39479 |
1 |
Eng |
39479 |
2 |
Eng |
39479 |
3 |
Eng |
|
The Employee Language Skill table is in 4NF because the relationship between Employee and Language Skill depends on the relationship between Employee and Language. Neither relationship is complete without the other. Here each row represents a three-part fact (e.g., Sally Smith1 reads2 English3).
Fifth Normal Form (5NF)
5NF is similar to 4NF because it too is concerned with data redundancy in tables that represent multiple many-to-many relationships, i.e., those with three or more identifying parents. The difference between the two is 4NF applies only when some of the parents are not related to one another (e.g., Role and Language in the 4NF example), while 5NF applies when there is a many-to-many relationship between each pair of parents. 5NF prohibits a table from representing more than one many-to-many relationship unless those relationships are interdependent.
In the following example each of the parent tables (Dealer, Manufacturer, and Vehicle Class) is related to each of the others:
| 1. |
A manufacturer produces many vehicle classes, and
many manufacturers produce the same vehicle class. |
| 2. |
A dealer sells for many manufacturers, and
many dealers sell for the same manufacturer. |
| 3. |
A dealer sells many vehicle classes, and
many dealers sell the same vehicle class. |
The Dealer Product table shows which dealers sell which vehicle classes from which manufacturers, so it represents all three relationships. It is not in 5NF because the first relationship does not depend on either of the other two. The fact that Dodge produces cars is complete without knowing which dealers sell for Dodge (relationship 2) or which dealers sell cars (relationship 3).
Manufacturer
Mfgr Id |
Mfgr Name |
D |
Dodge |
F |
Ford |
H |
Honda |
|
Dealer
Dealer Nbr |
Dealer Name |
1429 |
ABC Motors |
8254 |
Cars R Us |
5017 |
Vroom |
|
Vehicle Class
Class Id |
Class Name |
C |
Car |
T |
Truck |
M |
Motorcycle |
|
Dealer Product
Dealer Nbr |
Mfgr Id |
Class Id |
1429 |
D |
C |
1429 |
D |
T |
8254 |
H |
C |
8254 |
H |
M |
|
The first step in achieving 5NF is to move the first relationship into a separate table, such as the following Manufacturer Vehicle Class table. It shows which manufacturers produce which vehicle classes.
Manufacturer Vehicle Class
Mfgr Id |
Class Id |
D |
C |
D |
T |
F |
C |
F |
T |
H |
C |
H |
T |
H |
M |
|
The next step in achieving 5NF involves the second and third relationships: the Dealer: Manufacturer relationship and the Dealer: Vehicle Class relationship. These two relationships might depend on each other, or they might not.
They do depend on each other if:
A dealer can sell one vehicle class from one manufacturer and a different vehicle class from another manufacturer even though both manufacturers produce both vehicle classes.
For example, Vroom sells:
|
Dodge cars |
(but not Dodge trucks) |
|
Ford trucks |
(but not Ford cars) |
|
Honda motorcycles |
(but not Honda cars or trucks) |
In this case each fact has three parts: dealer, manufacturer, and vehicle class. A fact is complete only if all three parts are present. Since both relationships are part of the same fact, they must be represented in the same table. Notice that this is the original Dealer Product table. Moving the first relationship into the Manufacturer Vehicle Class table also put the Dealer Product table into 5NF, even though it still contains all the same columns.
Dealer Product
Dealer Nbr |
Mfgr Id |
Class Id |
5017 |
D |
C |
5017 |
F |
T |
5017 |
H |
M |
|
On the other hand, the second and third relationships do not depend on each other if:
A dealer cannot sell one vehicle class from one manufacturer and a different vehicle class from another manufacturer if both manufacturers produce both vehicle classes. The dealer must sell the vehicle class from both manufacturers or from neither.
For example, if ABC Motors sells for Dodge and Honda, and if ABC Motors sells cars, trucks, and motorcycles, then it sells Dodge cars and trucks, and Honda cars, trucks, and motorcycles. If ABC Motors starts selling for Ford, it will sell Ford cars and Ford trucks, not just one or the other. If ABC Motors stops selling trucks, it will stop selling both Dodge and Honda trucks, not just one or the other. If Dodge starts producing motorcycles, ABC Motors will sell them.
In this case the Dealer Product table is not in 5NF because the two relationships do not depend on each other. The table represents facts about two unrelated things (the two different relationships).
Dealer Product
Dealer Nbr |
Mfgr Id |
Class Id |
1429 |
D |
C |
1429 |
D |
T |
1429 |
H |
C |
1429 |
H |
T |
1429 |
H |
M |
|
The table can be put into 5NF by splitting it into two tables, one for each relationship, such as those below. The Dealer Manufacturer table shows which dealers sell for which manufacturers, and the Dealer Vehicle Class table shows which dealers sell which vehicle classes.
Dealer Manufacturer
Dealer Nbr |
Mfgr Id |
1429 |
D |
1429 |
H |
|
Dealer Vehicle Class
Dealer Nbr |
Class Id |
1429 |
C |
1429 |
T |
1429 |
M |
|
Now if ABC Motors starts selling for Ford, that fact is recorded by adding one row to the Dealer Manufacturer table rather than two rows to the Dealer Product table (one for cars and another for trucks). If it stops selling trucks (for both Dodge and Honda), a single row (1429, T) is removed from the Dealer Vehicle Class table rather than removing two rows from the Dealer Product table. If Dodge starts producing motorcycles, that fact is recorded by adding one row to the Manufacturer Vehicle Class table. That new fact together with the fact that ABC Motors sells for Dodge (first row in the Dealer Manufacturer table) and the fact the ABC Motors sells motorcycles (third row in the Dealer Vehicle Class table) generates the compound fact the ABC Motors now sells Dodge motorcycles.
From tables to a single view of the business
These are some simplified examples of how to apply normal forms to relational databases. Standardizing a database—which requires in-depth thought and enterprise-wide collaboration—will simplify information, reduce redundancies and avoid data inconsistencies. By administering these rules, an organization can get closer to its ultimate goal of reaching a single view of the business.
Appendix: 1NF, a broader view
The 1NF section in body of the paper explained that 1NF prohibits recording more than one fact in any column in any row of a table. Some practitioners (data modelers, database designers, etc.) use an expanded definition of 1NF that also prohibits duplicate columns in a table, i.e., two or more columns that contain the same type of facts.
The 1NF section includes the following example.
Employee
Emp Nbr |
Emp Name |
Pastime |
32954 |
Jason Jones |
Fishing, Sailing, Hiking |
18433 |
Susan Smith |
Hiking |
52853 |
Diane Downs |
Cooking, Fishing |
|
An alternate design that violates the spirit of 1NF is shown below.
Employee
Emp Nbr |
Emp Name |
Pastime 1 |
Pastime 2 |
Pastime 3 |
32954 |
Jason Jones |
Fishing |
Sailing |
Hiking |
18433 |
Susan Smith |
Hiking |
|
|
52853 |
Diane Downs |
Cooking |
Fishing |
|
|
All the columns in this table design contain single-valued facts, so it conforms to the classic definition of 1NF. As such, it avoids the update problems that are present in the first table design. A pastime can be updated without having to reenter the others.
However, this alternate design introduces new problems. No more than three pastimes can be recorded for an employee. If someone has more than three, the additional ones are not in the database, so the pastime information is incomplete. Also, a query to find out which employees enjoy a particular pastime (e.g., hiking) involves looking in three columns, not just one.
These problems can be avoided by splitting the table into the following pair of tables. These are the same tables that were described in the 1NF section in the body of the paper.
Employee
Emp Nbr |
Emp Name |
32954 |
Jason Jones |
18433 |
Susan Smith |
52853 |
Diane Downs |
|
Employee Pastime
Emp Nbr |
Pastime |
32954 |
Fishing |
32954 |
Sailing |
32954 |
Hiking |
18433 |
Hiking |
52853 |
Cooking |
52853 |
Fishing |
|
Return to First Normal Form (1NF). T
References
E. F. Codd, �A Relational Model of data for Large Shared Data Banks,� Communications of the ACM, Volume 13, Number 6, June 1970.
E. F. Codd, �Extending the Database Relational Model to Capture More Meaning,� ACM Transactions on Database Systems, Volume 4, Number 4, December 1979.
C. J. Date, An Introduction to Database Systems, Volume 1, Third Edition, Addison Wesley, 1982.
William Kent, �A Simple Guide to Five Normal Forms in Relational Database Theory,� Communications of the ACM, Volume 26, Number 2, February 1983.
Mary E. S. Loomis, The Database Book, Macmillan, 1987.
Dale R. Rollins is a senior consultant in Teradata's Professional Services organization. Dale has more than 25 years of experience in developing logical data models (LDMs) based on the business rules and data requirements of major corporations and government organizations. He is also experienced in transforming logical models into physical models, which serve as the designs for data warehouses or other shared databases. In addition, he participated in the development of the IDEF1X Data Modeling Technique and its adoption as a Federal Information Processing Standard.
Teradata Magazine-December 2006
|