Please send questions to
st10@humboldt.edu .
* slight review of functional dependency...
employee(EMPL_ID, empl_lname, empl_str_addr, empl_salary)
empl_id -> empl_salary
* empl_id is the determinant in the above functional
dependency
NORMALIZATION....
* [Kroenke] "For some relations, changing the data
can have undesirable consequences, called
modification anomalies"
We'll see that such anomalies can be eliminated
by carefully redefining the relation into 2 or
more relations;
* In operational settings, these redefined,
or NORMALIZED, relations are preferred;
* The process of getting rid of the possibility
of modification anomalies is normalization;
MODIFICATION ANOMALIES
Deletion anomaly:
* when deleting a row causes the loss of
additional information which it is not reasonable
to have completely disappear
* another take: these can occur when facts about
two or more "themes" are lost with one deletion;
Insertion anomaly:
* when we find that we cannot insert a fact about
one "thing" until we have another fact about
another "thing" -- and that restriction seems
unreasonable;
SO -- normalization's goal is to reduce the incidence of
these kinds of anomalies;
NORMAL FORMS
* relations can be classified based on the types of
modification anomalies to which they are
vulnerable;
these classifications are called NORMAL FORMS
* Some "classic" normal forms include:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
* NOTE: these are superset/subset --
a set of relations in 2NF is also in 1NF,
a set of relations in 3NF is also in 1NF and 2NF,
etc.
* You could think of normalization as a process
of working through a series of stages
involving these normal forms...
FIRST NORMAL FORM (1NF)
Any ("true") relation is, by definition, in 1NF.
* SO -- do you ever have to do extra work to get
into 1NF?
Usually, you need to when somebody "suddenly"
notices that an attribute is really multi-valued
(one instance can have multiple values for
that attribute)
...since you can't have cells with multiple values
in a true relation, that is no longer truly a
relation;
* HOW TO FIX THIS (and end up with a set of relations
in 1NF):
Say you had:
Car(CAR_ID, car_name, car_model)
uh oh, car_model can be multi-valued for some cars!
* create a NEW relation with the primary key
of the original relation, and the multi-valued
attribute, as its attributes -- and ALL
make up a composite primary key:
Car_models(CAR_ID, CAR_MODEL)
...and the primary key of the original relation
in this new relation is also a foreign key
back to the original relation:
foreign key(car_id) references Car
* ...and remove the multi-valued attribute from
the original relation:
Car(CAR_ID, car_name)
* resulting in the pair of relations:
Car(CAR_ID, car_name)
Car_models(CAR_ID, CAR_MODEL)
foreign key(car_id) references Car
* relations in 1NF can be shown to avoid modification
anomalies due to not really being relations;
SECOND NORMAL FORM (2NF)
* a relation is in 2NF if all of its non-primary-key
attributes are functionally dependent on all
of its primary key;
* alternate description:
A relation is in 2NF if:
1) it is in 1NF, and
2) it includes no PARTIAL DEPENDENCIES
(where a partial dependency is a dependency in
which a NON-primary-key attribute is functionally
dependent on just part of the primary key)
* or: all nonkey attributes are dependent on the
entire primary key...
* HOW TO CONVERT from 1NF to 2NF:
1. Write each possible subset of the primary
key on its own line
(each of these subsets MAY eventually be
the primary key of a new relation)
2. For each non-primary-key attribute in the
table, write it on the line of step 1
where it depends on the *entire* primary
key subset on that line
3. Get rid of any lines that just contain a
primary key subset, and no non-key attributes...
Turn the rest into relations,
...and (typically!) add foreign keys to
the multi-primary-key tables, referencing
the "base" table(s) for the pieces of the
primary key
DO YOU SEE ...that being in 2NF prevents modification
anomalies due to having attribute that are dependent
on just part of the primary key?
THIRD NORMAL FORM
...are all modification anomalies gone, then?
No -- for example, there are those modification
anomalies that are due to TRANSITIVE dependencies;
* remember the transitive property?
if A op B and B op C implies A op C
functional dependency is transitive...
* a transitive dependency is a functional dependency
where the determinant is a non-primary-key attribute
* SO -- if a 2NF relation has a transitive
dependency, you convert it to 3NF by:
1. Make a new relation with the determinant
of the transitive dependency as the
primary key
2. remove the dependent attributes in that
dependency from the base/original table,
3. the determinant in the base/original table
is now a foreign key referencing the new
table
...and when you are done, you have a set of 3NF
relations that have no modification anomalies
due to transitive dependencies...
Boyce-Codd Normal Form:
* a relation in which every determinant in that
table is a candidate key
4th Normal Form:
* a relation in BCNF that has no multiple sets of
multivalued dependencies
5th Normal Form:
* has to do with relations in 4NF that can be divided
into subrelations, but then cannot be reconstructed;
of mostly theoretical interest...