===== CS 325 - Week 9 Lecture 1 - 2021-10-18 ===== ===== TODAY WE WILL ===== * announcements * intro to Normalization * prep for next class * current reading: * DB Reading Packet 6 - Normalization * SQL Reading Packet 5 - "order by, group by, and having" * you should start Homework 6 - due Friday, 11:59 pm * should be working on your CS 325 Project - Database Model Draft milestone - due Sunday, October 24, 11:59 pm ===== Intro to Normalization and Normal Forms ===== * very related to functional dependencies; * we're assuming (mostly) so-called operational database, where we are updating it based on day-to-day operations; * For some relations, changing the data can have undesirable consequences -- these are called modification anomalies * and it will turn out that we can eliminate certain classes of these by redefining relations to be in particular normal forms * and "normalization" is the process where you redefine relations to get them to a particular normal form to prevent certain kinds of modification anomalies; * OK, let's start with modification anomalies! * deletion anomalies - when deleting a row results in the loss of "additional" information it does not seem reasonable to lose * deleting june arbro causes the loss of the info that job_class BioE is supposed to have a charge_per_hr of $55 * deleting anne ramoras causes the loss of the info that proj_num 3 has the name satellite * insertion anomalies - when you would like to store some information but you cannot without some other information, such that that restriction seems unreasonable * some also include update anomalies -- but that could also be seen as a combo of the above; ====== * Normalization is seeking to create/adapt sets of relations that will not have these modification anomalies (or that will have fewer of them, at least) We'd also like to reduce *unnecessary* data redundancy (in the original current relation we're discussing, if you change a project name, you have to change it in *all* the rows of employees who have done any work on that project) * it turns out that the careful "breaking up" of a single relation into one or more other relations, including appropriate foreign keys, can sometimes get rid of a modification anomaly * So, normalization is a process of carefully "breaking" a relation into multiple relations, each containing a separate "them", to reduce modification anomalies; (and adding in the appropriate referential integrity constraints -- foreign keys! 8-) 0-- to still let us relate what we want to relate) ===== Normal Forms ===== * some people came up with a way to classify relations based on the types of modification anomalies to which they are vulnerable; these classifications are called NORMAL FORMS Here is that classification: * first normal form (1NF) * second normal form (2NF) * third normal form (3NF) * Boyce-Codd normal form (BCNF) * fourth normal form (4NF) * fifth normal form (5NF) * sixth normal form (6NF) * these are superset/subset -- that is, if your relations are in 2NF, they are also (by definition) in 1NF if your relations are in 3NF, they are also (by definition) in 1NF and 2NF if your relations are in BCNF, they are also (by definition) in 1NF, 2NF and 3NF if your relations are in 4NF, they are also (by definition) in 1NF, 2NF, 3NF, and BCNF and so on * you COULD think of normalization as working through a series of stages involving these normal forms; * and happily, usually it is considered sufficient for operational databases if they are in 3NF (or close enough to 3NF for that scenario's purposes...) ----- * 1NF: any (true) relation is, by definition, in 1NF ...the most common issue to get a set of relations to 1NF is if a multivalued attribute has slipped in there, and a relation is NOT to have multiple values in a single cell; * 2NF: a relation is in 2NF if it is in 1NF and it has no PARTIAL dependencies * a partial dependency is when a non-primary-key attribute is dependent on just PART of the primary key * 3NF: a relation is in 3NF if it is in 2NF and has no TRANSITIVE dependencies * a transitive dependency is when a non-primary-key attribute NPK1 functionally determines another non-primary-key attribute NPK2 (so the primary key functionally determines NPK1 and so transitively also determines NPK2) * BCNF - a relation is in BCNF if it is in 3NF and every determinant in the table is a candidate key * 4NF - a relation is in 4NF if it is in BCNF and includes no multiple sets of multivalued dependencies * ...we're stopping there!