Please send questions to st10@humboldt.edu .

*   slight review of functional dependency...

employee(EMPL_ID, empl_lname, empl_str_addr, empl_salary)

empl_id -> empl_salary

*   empl_id is the determinant in the above functional
    dependency

NORMALIZATION....

*   [Kroenke] "For some relations, changing the data
    can have undesirable consequences, called
    modification anomalies"

    We'll see that such anomalies can be eliminated
    by carefully redefining the relation into 2 or
    more relations;

    *   In operational settings, these redefined,
        or NORMALIZED, relations are preferred;

*   The process of getting rid of the possibility
    of modification anomalies is normalization;

MODIFICATION ANOMALIES

Deletion anomaly:
*   when deleting a row causes the loss of
    additional information which it is not reasonable
    to have completely disappear

*   another take: these can occur when facts about
    two or more "themes" are lost with one deletion;

Insertion anomaly:
*   when we find that we cannot insert a fact about
    one "thing" until we have another fact about
    another "thing" -- and that restriction seems
    unreasonable;

SO -- normalization's goal is to reduce the incidence of
these kinds of anomalies;

NORMAL FORMS

*   relations can be classified based on the types of
    modification anomalies to which they are
    vulnerable;
    these classifications are called NORMAL FORMS

*   Some "classic" normal forms include:

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

    *   NOTE: these are superset/subset --
        a set of relations in 2NF is also in 1NF,
	a set of relations in 3NF is also in 1NF and 2NF,
	etc.

    *   You could think of normalization as a process
        of working through a series of stages
	involving these normal forms...

FIRST NORMAL FORM (1NF)

Any ("true") relation is, by definition, in 1NF.

*   SO -- do you ever have to do extra work to get
    into 1NF?

    Usually, you need to when somebody "suddenly"
    notices that an attribute is really multi-valued
    (one instance can have multiple values for
    that attribute)

    ...since you can't have cells with multiple values
    in a true relation, that is no longer truly a
    relation;

*   HOW TO FIX THIS (and end up with a set of relations
    in 1NF):

    Say you had:
    Car(CAR_ID, car_name, car_model)
    uh oh, car_model can be multi-valued for some cars!

    *   create a NEW relation with the primary key
        of the original relation, and the multi-valued
	attribute, as its attributes -- and ALL
	make up a composite primary key:

        Car_models(CAR_ID, CAR_MODEL)

        ...and the primary key of the original relation
	   in this new relation is also a foreign key
	   back to the original relation:

        foreign key(car_id) references Car

    *   ...and remove the multi-valued attribute from
        the original relation:

        Car(CAR_ID, car_name)

    *   resulting in the pair of relations:

        Car(CAR_ID, car_name)

        Car_models(CAR_ID, CAR_MODEL)
           foreign key(car_id) references Car

*   relations in 1NF can be shown to avoid modification
    anomalies due to not really being relations;

SECOND NORMAL FORM (2NF)

*   a relation is in 2NF if all of its non-primary-key
    attributes are functionally dependent on all
    of its primary key;

*   alternate description:
    A relation is in 2NF if:
    1) it is in 1NF, and
    2) it includes no PARTIAL DEPENDENCIES

    (where a partial dependency is a dependency in
    which a NON-primary-key attribute is functionally
    dependent on just part of the primary key)

*   or: all nonkey attributes are dependent on the
    entire primary key...

*   HOW TO CONVERT from 1NF to 2NF:

    1. Write each possible subset of the primary
       key on its own line
       (each of these subsets MAY eventually be
       the primary key of a new relation)

    2. For each non-primary-key attribute in the
       table, write it on the line of step 1
       where it depends on the *entire* primary
       key subset on that line

    3. Get rid of any lines that just contain a
       primary key subset, and no non-key attributes...
       Turn the rest into relations,

       ...and (typically!) add foreign keys to
       the multi-primary-key tables, referencing
       the "base" table(s) for the pieces of the
       primary key

DO YOU SEE ...that being in 2NF prevents modification
   anomalies due to having attribute that are dependent
   on just part of the primary key?

THIRD NORMAL FORM

...are all modification anomalies gone, then?
No -- for example, there are those modification
   anomalies that are due to TRANSITIVE dependencies;

*   remember the transitive property?

    if A op B and B op C implies A op C

    functional dependency is transitive...

*   a transitive dependency is a functional dependency
    where the determinant is a non-primary-key attribute

*   SO -- if a 2NF relation has a transitive
    dependency, you convert it to 3NF by:

    1. Make a new relation with the determinant
       of the transitive dependency as the
       primary key

    2. remove the dependent attributes in that
       dependency from the base/original table,

    3. the determinant in the base/original table
       is now a foreign key referencing the new
       table

...and when you are done, you have a set of 3NF
relations that have no modification anomalies
due to transitive dependencies...

Boyce-Codd Normal Form:
*   a relation in which every determinant in that
    table is a candidate key

4th Normal Form:
*   a relation in BCNF that has no multiple sets of
    multivalued dependencies

5th Normal Form:
*   has to do with relations in 4NF that can be divided
    into subrelations, but then cannot be reconstructed;
    of mostly theoretical interest...