Register | Log in


Subscribe Now>>
ARCHIVE: Vol. 6, No. 4
Home News Tech2Tech Features Viewpoints Facts & Fun Teradata.com
Features
Send to Colleague

Normalizing the database cont.

A simplified explanation of normal forms.

by Dale R. Rollins

Fourth Normal Form (4NF)
4NF is concerned with data redundancy that arises when facts about unrelated many-to-many relationships are recorded in the same table. 4NF prohibits a table from representing more than one many-to-many relationship unless those relationships are interdependent.

The following example involves two many-to-many relationships:
1. An employee can fill many roles, and
many employees can fill the same role.
2. An employee can be skilled in many languages, and
many employees can be skilled in the same language.

The Employee Capability table shows both of these abilities for each employee. However, it is not in 4NF because (for the purpose of this example) language skills have no bearing on role abilities.

Employee
Emp Nbr
Emp Name
72956
Sally Smith
39479
Bob Burns

Role
Role Nbr
Role Name
1
Analyst
2
Researcher
3
Accountant

Language
Language Id
Language Name
Eng
English
Fre
French
Ger
German

Employee Capability
Emp Nbr
Role Nbr
Language Id
72956
1
Eng
72956
1
Ger
72956
2
Eng
72956
2
Ger
39479
1
Fre
39479
1
Eng
39479
3
Fre
39479
3
Eng

Sally Smith can fill two roles and is skilled in two languages. However, each of those facts is recorded twice because unrelated facts are stored in the same table. If Sally acquires a third language, that fact will have to be recorded in two new rows, one for each of her roles. If she later becomes able to fill a third role, that fact will have to be recorded in three new rows, one for each of her languages.

The table can be put into 4NF by splitting it into two tables, one for each of the many-to-many relationships. The Employee Role table shows which roles each employee can fill. The Employee Language table shows which languages each employee is skilled in. Now recording a new role or language for an employee involves adding only one new row to the appropriate table.

Employee Role
Emp Nbr
Role Nbr
72956
1
72956
2
39479
1
39479
3

Employee Language
Emp Nbr
Language Id
72956
Eng
72956
Ger
39479
Fre
39479
Eng

A table with only two identifying parents (i.e., one whose primary key includes foreign keys from only two parent tables) represents a single many-to-many relationship, and so is always in 4NF. Only a table with three or more identifying parents can violate 4NF. These are usually a small minority of the tables in a database.

However, a table can have three identifying parents (or more) and still be in 4NF. Consider the following, somewhat similar example. Here the Employee Language Skill table shows which skills an employee has in each language.

Employee
Emp Nbr
Emp Name
72956
Sally Smith
39479
Bob Burns

Language Skill
Skill Nbr
Skill Name
1
Reading
2
Writing
3
Speaking

Language
Language Id
Language Name
Eng
English
Fre
French
Ger
German

Employee Language Skill
Emp Nbr
Skill Nbr
Language Id
72956
1
Eng
72956
2
Eng
72956
3
Eng
72956
1
Ger
39479
3
Fre
39479
1
Eng
39479
2
Eng
39479
3
Eng

The Employee Language Skill table is in 4NF because the relationship between Employee and Language Skill depends on the relationship between Employee and Language. Neither relationship is complete without the other. Here each row represents a three-part fact (e.g., Sally Smith1 reads2 English3).

Fifth Normal Form (5NF)
5NF is similar to 4NF because it too is concerned with data redundancy in tables that represent multiple many-to-many relationships, i.e., those with three or more identifying parents. The difference between the two is 4NF applies only when some of the parents are not related to one another (e.g., Role and Language in the 4NF example), while 5NF applies when there is a many-to-many relationship between each pair of parents. 5NF prohibits a table from representing more than one many-to-many relationship unless those relationships are interdependent.

In the following example each of the parent tables (Dealer, Manufacturer, and Vehicle Class) is related to each of the others:
1. A manufacturer produces many vehicle classes, and
many manufacturers produce the same vehicle class.
2. A dealer sells for many manufacturers, and
many dealers sell for the same manufacturer.
3. A dealer sells many vehicle classes, and
many dealers sell the same vehicle class.

The Dealer Product table shows which dealers sell which vehicle classes from which manufacturers, so it represents all three relationships. It is not in 5NF because the first relationship does not depend on either of the other two. The fact that Dodge produces cars is complete without knowing which dealers sell for Dodge (relationship 2) or which dealers sell cars (relationship 3).

Manufacturer
Mfgr Id
Mfgr Name
D
Dodge
F
Ford
H
Honda

Dealer
Dealer Nbr
Dealer Name
1429
ABC Motors
8254
Cars R Us
5017
Vroom

Vehicle Class
Class Id
Class Name
C
Car
T
Truck
M
Motorcycle

Dealer Product
Dealer Nbr
Mfgr Id
Class Id
1429
D
C
1429
D
T
8254
H
C
8254
H
M

The first step in achieving 5NF is to move the first relationship into a separate table, such as the following Manufacturer Vehicle Class table. It shows which manufacturers produce which vehicle classes.

Manufacturer Vehicle Class
Mfgr Id
Class Id
D
C
D
T
F
C
F
T
H
C
H
T
H
M

The next step in achieving 5NF involves the second and third relationships: the Dealer: Manufacturer relationship and the Dealer: Vehicle Class relationship. These two relationships might depend on each other, or they might not.

They do depend on each other if:
A dealer can sell one vehicle class from one manufacturer and a different vehicle class from another manufacturer even though both manufacturers produce both vehicle classes.

For example, Vroom sells:
Dodge cars (but not Dodge trucks)
Ford trucks (but not Ford cars)
Honda motorcycles (but not Honda cars or trucks)

In this case each fact has three parts: dealer, manufacturer, and vehicle class. A fact is complete only if all three parts are present. Since both relationships are part of the same fact, they must be represented in the same table. Notice that this is the original Dealer Product table. Moving the first relationship into the Manufacturer Vehicle Class table also put the Dealer Product table into 5NF, even though it still contains all the same columns.

Dealer Product
Dealer Nbr
Mfgr Id
Class Id
5017
D
C
5017
F
T
5017
H
M

On the other hand, the second and third relationships do not depend on each other if:
A dealer cannot sell one vehicle class from one manufacturer and a different vehicle class from another manufacturer if both manufacturers produce both vehicle classes. The dealer must sell the vehicle class from both manufacturers or from neither.

For example, if ABC Motors sells for Dodge and Honda, and if ABC Motors sells cars, trucks, and motorcycles, then it sells Dodge cars and trucks, and Honda cars, trucks, and motorcycles. If ABC Motors starts selling for Ford, it will sell Ford cars and Ford trucks, not just one or the other. If ABC Motors stops selling trucks, it will stop selling both Dodge and Honda trucks, not just one or the other. If Dodge starts producing motorcycles, ABC Motors will sell them.

In this case the Dealer Product table is not in 5NF because the two relationships do not depend on each other. The table represents facts about two unrelated things (the two different relationships).

Dealer Product
Dealer Nbr
Mfgr Id
Class Id
1429
D
C
1429
D
T
1429
H
C
1429
H
T
1429
H
M

The table can be put into 5NF by splitting it into two tables, one for each relationship, such as those below. The Dealer Manufacturer table shows which dealers sell for which manufacturers, and the Dealer Vehicle Class table shows which dealers sell which vehicle classes.

Dealer Manufacturer
Dealer Nbr
Mfgr Id
1429
D
1429
H

Dealer Vehicle Class
Dealer Nbr
Class Id
1429
C
1429
T
1429
M

Now if ABC Motors starts selling for Ford, that fact is recorded by adding one row to the Dealer Manufacturer table rather than two rows to the Dealer Product table (one for cars and another for trucks). If it stops selling trucks (for both Dodge and Honda), a single row (1429, T) is removed from the Dealer Vehicle Class table rather than removing two rows from the Dealer Product table. If Dodge starts producing motorcycles, that fact is recorded by adding one row to the Manufacturer Vehicle Class table. That new fact together with the fact that ABC Motors sells for Dodge (first row in the Dealer Manufacturer table) and the fact the ABC Motors sells motorcycles (third row in the Dealer Vehicle Class table) generates the compound fact the ABC Motors now sells Dodge motorcycles.

From tables to a single view of the business
These are some simplified examples of how to apply normal forms to relational databases. Standardizing a database—which requires in-depth thought and enterprise-wide collaboration—will simplify information, reduce redundancies and avoid data inconsistencies. By administering these rules, an organization can get closer to its ultimate goal of reaching a single view of the business.

Appendix: 1NF, a broader view
The 1NF section in body of the paper explained that 1NF prohibits recording more than one fact in any column in any row of a table. Some practitioners (data modelers, database designers, etc.) use an expanded definition of 1NF that also prohibits duplicate columns in a table, i.e., two or more columns that contain the same type of facts.

The 1NF section includes the following example.

Employee
Emp Nbr
Emp Name
Pastime
32954
Jason Jones Fishing, Sailing, Hiking
18433
Susan Smith Hiking
52853
Diane Downs Cooking, Fishing

An alternate design that violates the spirit of 1NF is shown below.

Employee
Emp Nbr
Emp Name
Pastime 1
Pastime 2
Pastime 3
32954
Jason Jones
Fishing
Sailing
Hiking
18433
Susan Smith
Hiking
52853
Diane Downs
Cooking
Fishing

All the columns in this table design contain single-valued facts, so it conforms to the classic definition of 1NF. As such, it avoids the update problems that are present in the first table design. A pastime can be updated without having to reenter the others.

However, this alternate design introduces new problems. No more than three pastimes can be recorded for an employee. If someone has more than three, the additional ones are not in the database, so the pastime information is incomplete. Also, a query to find out which employees enjoy a particular pastime (e.g., hiking) involves looking in three columns, not just one.

These problems can be avoided by splitting the table into the following pair of tables. These are the same tables that were described in the 1NF section in the body of the paper.

Employee
Emp Nbr
Emp Name
32954
Jason Jones
18433
Susan Smith
52853
Diane Downs

Employee Pastime
Emp Nbr
Pastime
32954
Fishing
32954
Sailing
32954
Hiking
18433
Hiking
52853
Cooking
52853
Fishing

Return to First Normal Form (1NF). T

References
E. F. Codd, �A Relational Model of data for Large Shared Data Banks,� Communications of the ACM, Volume 13, Number 6, June 1970.

E. F. Codd, �Extending the Database Relational Model to Capture More Meaning,� ACM Transactions on Database Systems, Volume 4, Number 4, December 1979.

C. J. Date, An Introduction to Database Systems, Volume 1, Third Edition, Addison Wesley, 1982.

William Kent, �A Simple Guide to Five Normal Forms in Relational Database Theory,� Communications of the ACM, Volume 26, Number 2, February 1983.

Mary E. S. Loomis, The Database Book, Macmillan, 1987.

Dale R. Rollins is a senior consultant in Teradata's Professional Services organization. Dale has more than 25 years of experience in developing logical data models (LDMs) based on the business rules and data requirements of major corporations and government organizations. He is also experienced in transforming logical models into physical models, which serve as the designs for data warehouses or other shared databases. In addition, he participated in the development of the IDEF1X Data Modeling Technique and its adoption as a Federal Information Processing Standard.

Teradata Magazine-December 2006

More Features

Related Links

Reference Library

Get complete access to Teradata articles and white papers specific to your area of interest by selecting a category below. Reference Library
Search our library:

Teradata.com | About Us | Contact Us | Media Kit | Subscribe | Privacy/Legal | RSS
Copyright © 2008 Teradata Corporation. All rights reserved.