===== CS 325 - Week 9 Asynchronous Material ===== ===== "TODAY" WE WILL ===== * announcements * intro to Normalization * SQL - intro to order by clause * SQL - intro to group by and having clauses * prep for next clause * Reading * DB Reading Packet 6: Normalization * SQL Reading Packet 5: Order by, Group by, and Having * You should be working on Homework 6! its problems are due by 11:59 pm this Friday, October 22 * You should be working on the CS 325 Project - ER Model DRAFT milestone, due 11:59 pm this SUNDAY, October 24 ===== * continue our INTRO to NORMALIZATION ===== * last time we talked about modification anomalies; * we also talked about normal forms, and the sequence of normal forms 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, and 6NF ...and the "higher" you are here, the fewer categories of reasons for modification anomalies a set of relations might be prone to (although it is usually considered that, for an operational database (used for day-to-day operations in a setting), normalizing to 3NF is sufficient) * note that these are defined such that: a relation in 2NF is also in 1NF, a relation in 3NF is also in 1NF and 2NF, a relation in BCNF is also in 1NF and 2NF and 3NF, and so on; * we mentioned that: * a relation is in 1NF if it is indeed a true relation * a relation is in 2NF if it is in 1NF and has no partial dependencies * partial dependency: one in which a non-primary-key attrbute is dependent on just PART of a relation's primary key * a reclation in 3NF if it is in 2NF and has no transitive dependencies * transitive dependency: one in which a non-primary-key attribute is dependent on another non-primary-key attribute ====== * normalizing to 1NF: * well, if it is NOT a true relation after all, modify it so it is! * the most common issue I'm aware of: * oops, an attribute is multi-valued for a tuple in a relation * the way you handle this is: * each multi-valued attribute is "broken out" into a new relation as follows: * make a new relation whose attributes are: * the primary key of the existing relation * the multi-valued attribute * the primary key of the NEW relation is the set of the primary key of the existing relation AND the multi-valued attribute! * the new relation has as a foreign key the primary key from the existing relation referencing the existing relation * and, in the existing relation, you REMOVE the multi-valued attribute * EXAMPLE: Shirt_style(STYLE_ID, shirt_name, shirt_fabric, color_avlbl) * OH NO! for a given shirt instance, color_avlbl is multi-valued! ...so Shirt_style is not a true relation!! * so, let's NORMALIZE Shirt_style to 1NF by breaking out the multi-valued attribute color_avlbl to its own relation: Shirt_style(STYLE_ID, shirt_name, shirt_fabric) Shirt_style_color(STYLE_ID, COLOR_AVLBL) foreign key (style_id) references Shirt_style for example: Shirt_style(1231, 'Godiva', 'Organic cotton') Shirt_style_color(1231, 'Puce') Shirt_style_color(1231, 'Magenta') ====== converting a relation into 2NF ====== * check for any PARTIAL dependencies! * note: a relation can only possibly have a partial dependency if it has a multi-attribute primary keys! * SO: what if it turns out a relation DOES a partial dependency? Here's the "algorithm" for breaking up such a relation into relations that will be in 2NF 1. write each of the primary key "parts" on separate lines, and then each combination of primary key "parts" on separate lines; 2. then, after each of these, consider each non-primary key attribute in that relation, and put it on the line where it is functionally dependent on ALL of the primary key attribute(s) in that line 3. for lines with no non-key attributes added, ignore them/throw them out for each line WITH a non-key attribute added, that's a new relation! ...name it appropriately for its "theme", make its primary key the combo of attribute/attributes from the original relation; where does the foreign key go here? ...put the foreign key with the table (usually) that still has a multi-attribute primary key, because it needs to reference the "home" info for each attribute within that primary key; * example: consider the "ugly" relation we used last time for discussing modification anomalies: written in relation structure form as: Work_on_project(PROJ_NUM, proj_name, EMPL_NUM, empl_name, job_class, chg_per_hr, hrs) ...this is NOT in 2NF, because it has more than one partial dependency -- note that proj_num -> proj_name, and so proj_name is functionally determined by just PART of Work_on_project's primary key of {PROJ_NUM, EMPL_NUM} * writing each primary key attribute and combo of attributes on their own lines: PROJ_NUM - proj_name EMPL_NUM - empl_name, job_class, chg_per_hr PROJ_NUM, EMPL_NUM - hrs * now, for each line with non-key attributes to it, make that a new relation, give it an appropriate name Project(PROJ_NUM, proj_name) Employee(EMPL_NUM, empl_name, job_class, chg_per_hr) Work_Done(PROJ_NUM, EMPL_NUM, hrs) * and, (usually) add foreign keys for the table that still has a multi-attribute primary key, referencing the table with the "home" for that; Project(PROJ_NUM, proj_name) Employee(EMPL_NUM, empl_name, job_class, chg_per_hr) Work_Done(PROJ_NUM, EMPL_NUM, hrs) foreign key(proj_num) references project, foreign key(empl_num) references employee ^ these relations are now in 2NF, as they are in 1NF and now have NO partial dependencies! ===== normalizing to 3NF ===== * how do I get rid of any transitive dependencies? * make a NEW relation whose primary key is the determinant of the transitive dependency * it contains the non-key attribute(s) that it determines (that are functionally dependent of it) * REMOVE the NON-KEY attributes of the transitive dependency FROM the not-3NF-relation AND make the previous-determinant-of-the-transitive-dependency in the original relation a FOREIGN KEY referencing the new relation! * example: Project(PROJ_NUM, proj_name) -- already in 3NF Employee(EMPL_NUM, empl_name, job_class, chg_per_hr) Work_Done(PROJ_NUM, EMPL_NUM, hrs) -- already in 3NF foreign key(proj_num) references project, foreign key(empl_num) references employee * look! Employee has a transitive dependency, because job_class -> chg_per_hr and job_class and chg_per_hr are both non-key attributes! Job_Info(JOB_CLASS, chg_per_hr) Employee(EMPL_NUM, empl_name, job_class) foreign key(job_class) references Job_Info SO, this set of relations is now in 3NF: Project(PROJ_NUM, proj_name) Job_Info(JOB_CLASS, chg_per_hr) Employee(EMPL_NUM, empl_name, job_class) foreign key(job_class) references Job_Info Work_Done(PROJ_NUM, EMPL_NUM, hrs) foreign key(proj_num) references project, foreign key(empl_num) references employee ======== SQL stuff/demo time! ...see 325lect09-2.sql!