=====
CS 325 - Week 9 Asynchronous Material
=====

=====
"TODAY" WE WILL
=====
*   announcements
*   intro to Normalization
*   SQL - intro to order by clause
*   SQL - intro to group by and having clauses
*   prep for next clause

*   Reading
    *   DB Reading Packet 6: Normalization
    *   SQL Reading Packet 5: Order by, Group by, and Having

*   You should be working on Homework 6! its problems are due by 11:59 pm
    this Friday, October 22

*   You should be working on the CS 325 Project - ER Model DRAFT milestone,
    due 11:59 pm this SUNDAY, October 24

=====
*   continue our INTRO to NORMALIZATION
=====
*   last time we talked about modification anomalies;

*   we also talked about normal forms, and the sequence of normal forms
    1NF, 2NF, 3NF, BCNF, 4NF, 5NF, and 6NF

    ...and the "higher" you are here, the fewer categories of
       reasons for modification anomalies a set of relations might be
       prone to (although it is usually considered that,
       for an operational database (used for day-to-day operations
       in a setting), normalizing to 3NF is sufficient)

    *   note that these are defined such that:
    	a relation in 2NF is also in 1NF,
	a relation in 3NF is also in 1NF and 2NF,
	a relation in BCNF is also in 1NF and 2NF and 3NF,
	and so on;

*   we mentioned that:
    *   a relation is in 1NF if it is indeed a true relation

    *   a relation is in 2NF if it is in 1NF and has no partial dependencies
        *   partial dependency: one in which a non-primary-key attrbute
	    is dependent on just PART of a relation's primary key

    *   a reclation in 3NF if it is in 2NF and has no transitive dependencies
        *   transitive dependency: one in which a non-primary-key attribute
	    is dependent on another non-primary-key attribute

======
*   normalizing to 1NF:
    *   well, if it is NOT a true relation after all, modify it so it is!

    *   the most common issue I'm aware of:
        *   oops, an attribute is multi-valued for a tuple in a relation

    *   the way you handle this is:
        *   each multi-valued attribute is "broken out" into a new
	    relation as follows:

            *   make a new relation whose attributes are:
	        *   the primary key of the existing relation
		*   the multi-valued attribute

                *   the primary key of the NEW relation is the
		    set of the primary key of the existing relation
		    AND the multi-valued attribute!

                *   the new relation has as a foreign key
		    the primary key from the existing relation
		    referencing the existing relation

            *   and, in the existing relation, you REMOVE the
	        multi-valued attribute

    *   EXAMPLE:
        Shirt_style(STYLE_ID, shirt_name, shirt_fabric, color_avlbl)

        *   OH NO! for a given shirt instance, color_avlbl is multi-valued!
	    ...so Shirt_style is not a true relation!!

        *   so, let's NORMALIZE Shirt_style to 1NF by breaking out
	    the multi-valued attribute color_avlbl to its own relation:

        Shirt_style(STYLE_ID, shirt_name, shirt_fabric)

        Shirt_style_color(STYLE_ID, COLOR_AVLBL)
	    foreign key (style_id) references Shirt_style
    
        for example:
        Shirt_style(1231, 'Godiva', 'Organic cotton')

        Shirt_style_color(1231, 'Puce')
	Shirt_style_color(1231, 'Magenta')

======
converting a relation into 2NF
======
*   check for any PARTIAL dependencies!
    *   note: a relation can only possibly have a partial dependency
        if it has a multi-attribute primary keys!

*   SO: what if it turns out a relation DOES a partial dependency?
    Here's the "algorithm" for breaking up such a relation
    into relations that will be in 2NF

    1. write each of the primary key "parts" on separate lines,
       and then each combination of primary key "parts" on separate lines;

    2. then, after each of these, consider each non-primary key attribute
       in that relation, and put it on the line where it is functionally
       dependent on ALL of the primary key attribute(s) in that line

    3. for lines with no non-key attributes added, ignore them/throw them out

       for each line WITH a non-key attribute added, that's a new relation!

       ...name it appropriately for its "theme",
          make its primary key the combo of attribute/attributes from
	      the original relation;

          where does the foreign key go here?
	  ...put the foreign key with the table (usually) that still
	     has a multi-attribute primary key,
	     because it needs to reference the "home" info for
	         each attribute within that primary key;

     *   example: consider the "ugly" relation we used last time for
         discussing modification anomalies:

         written in relation structure form as:
	 Work_on_project(PROJ_NUM, proj_name, EMPL_NUM, empl_name,
	                 job_class, chg_per_hr, hrs)

         ...this is NOT in 2NF,
	    because it has more than one partial dependency --
	        note that proj_num -> proj_name,
		and so proj_name is functionally determined by just
		    PART of Work_on_project's primary key of
		    {PROJ_NUM, EMPL_NUM}
    
    *   writing each primary key attribute and combo of attributes
        on their own lines:

    PROJ_NUM - proj_name

    EMPL_NUM - empl_name, job_class, chg_per_hr

    PROJ_NUM, EMPL_NUM - hrs

    *   now, for each line with non-key attributes to it,
        make that a new relation, give it an appropriate name

        Project(PROJ_NUM, proj_name)

        Employee(EMPL_NUM, empl_name, job_class, chg_per_hr)

        Work_Done(PROJ_NUM, EMPL_NUM, hrs)

    *   and, (usually) add foreign keys for the table that still
        has a multi-attribute primary key, referencing the table with
	the "home" for that;

        Project(PROJ_NUM, proj_name)

        Employee(EMPL_NUM, empl_name, job_class, chg_per_hr)

        Work_Done(PROJ_NUM, EMPL_NUM, hrs)
	    foreign key(proj_num) references project,
	    foreign key(empl_num) references employee

        ^ these relations are now in 2NF, as they are in 1NF
	  and now have NO partial dependencies!

=====
normalizing to 3NF
=====
*   how do I get rid of any transitive dependencies?

    *   make a NEW relation whose primary key is the determinant
        of the transitive dependency

        *   it contains the non-key attribute(s) that it determines
	    (that are functionally dependent of it)

    *   REMOVE the NON-KEY attributes of the transitive dependency
        FROM the not-3NF-relation

        AND make the previous-determinant-of-the-transitive-dependency
	in the original relation a FOREIGN KEY referencing the new
	relation!

     *   example:

        Project(PROJ_NUM, proj_name)         -- already in 3NF  

        Employee(EMPL_NUM, empl_name, job_class, chg_per_hr)

        Work_Done(PROJ_NUM, EMPL_NUM, hrs)   -- already in 3NF
	    foreign key(proj_num) references project,
	    foreign key(empl_num) references employee        

        *   look! Employee has a transitive dependency,
	    because job_class -> chg_per_hr and job_class and
	    chg_per_hr are both non-key attributes!

        Job_Info(JOB_CLASS, chg_per_hr)

        Employee(EMPL_NUM, empl_name, job_class)
	    foreign key(job_class) references Job_Info

    SO, this set of relations is now in 3NF:
    
        Project(PROJ_NUM, proj_name)

        Job_Info(JOB_CLASS, chg_per_hr)

        Employee(EMPL_NUM, empl_name, job_class)
	    foreign key(job_class) references Job_Info

        Work_Done(PROJ_NUM, EMPL_NUM, hrs)   
	    foreign key(proj_num) references project,
            foreign key(empl_num) references employee            

========
SQL stuff/demo time!
...see 325lect09-2.sql!