Databases with Mike Chapple, about.com, 2003.
http://databases.about.com/library/glossary/bldef-normalization.htm
Normalization is the process of structuring relational database schema such that most ambiguity is removed. The stages of normalization are referred to as normal forms and progress from the least restrictive (First Normal Form) through the most restrictive (Fifth Normal Form). Generally, most database designers do not attempt to implement anything higher than Third Normal Form or Boyce-Codd Normal Form.
A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic. More simply, to be in 1NF, each column must contain only a single value and each row must contain the same columns.
In order to be in Second Normal Form, a relation must first fulfill the requirements to be in First Normal Form. Additionally, each nonkey attribute in the relation must be functionally dependent upon the primary key.
Example: The following relation is in First Normal Form, but not Second Normal Form:
Order # | Customer | Contact Person | Total |
1 | Acme Widgets | John Doe | $134.23 |
2 | ABC Corporation | Fred Flintstone | $521.24 |
3 | Acme Widgets | John Doe | $1042.42 |
4 | Acme Widgets | John Doe | $928.53 |
In the table above, the order number serves as the primary key. Notice that the customer and total amount are dependent upon the order number -- this data is specific to each order. However, the contact person is dependent upon the customer. An alternative way to accomplish this would be to create two tables:
Customer | Contact Person |
Acme Widgets | John Doe |
ABC Corporation | Fred Flintstone |
Order # | Customer | Total |
1 | Acme Widgets | $134.23 |
2 | ABC Corporation | $521.24 |
3 | Acme Widgets | $1042.42 |
4 | Acme Widgets | $928.53 |
The creation of two separate tables eliminates the dependency problem experienced in the previous case. In the first table, contact person is dependent upon the primary key -- customer name. The second table only includes the information unique to each order. Someone interested in the contact person for each order could obtain this information by performing a JOIN operation.
In order to be in Third Normal Form, a relation must first fulfill the requirements to be in Second Normal Form. Additionally, all attributes that are not dependent upon the primary key must be eliminated.
Examples: The following table is NOT in Third Normal Form:
Company | City | State | ZIP |
Acme Widgets | New York | NY | 10169 |
ABC Corporation | Miami | FL | 33196 |
XYZ, Inc. | Columbia | MD | 21046 |
In this example, the city and state are dependent upon the ZIP code. To place this table in 3NF, two separate tables would be created -- one containing the company name and ZIP code and the other containing city, state, ZIP code pairings.
This may seem overly complex for daily applications and indeed it may be. Database designers should always keep in mind the tradeoffs between higher level normal forms and the resource issues that complexity creates.