325lect04-2-projected


=====
CS 325 - Week 4 Asynchronous Materials
=====

=====
"TODAY" WE WILL
=====
*   announcements
*   continue intro to the relational model
*   intro to writing relational operations in SQL
*   prep for next class

-----
Announcements
-----
*   Current reading:
    *   DB Reading Packet 3, introducing the relational model
    *   SQL Reading Packet 2, writing relational operations using SQL

*   Next homework -- Homework 3 -- will come out this weekend

=====
*   relation scheme - a finite sequence of unique attribute names

    *    e.g.,
         employee = (empl_id, empl_lname, empl_str_addr, empl_salary)

         ...this is not to far from our relation structure form...

         employee(EMPL_ID, empl_lname, empl_str_addr, empl_salary)

*   each attribute name A is associated with a domain, dom(A), a set of
    value, which MAY include a special value null

    *   dom(empl_salary) could be said to be the set of real numbers and
            the special value null,
	    could be said to be the set of possible salaries an employee
	    might be paid, or null for, say, volunteer employees who
	    must never be paid...?

*   given a relation scheme

    R = (A1, A2, ... An)

    a relation r is defined as any finite subset of the Cartesian product

        dom(A1) x dom(A2) ... dom(An)

    *   a sample relation for the employee relation scheme earlier could be

        { (1111, 'Jones', '111 Ash Str', 20000),
	  (2222, 'Nguyen', '123 Elm Str', 25000) }

        *   each element of this set is alled a tuple (row)

    *   a collection of such relations is a relational database...!

=====
*   FUNCTIONAL DEPENDENCY
=====

*   functional dependency - a relationship between or among attributes,
    SUCH THAT, given the value of one attribute, A, you can look up/obtain
    the value of another attribute, B

    *   if you can do that, then we say that B is functionally dependent
        on A

        ...that A determines B

    *   CONSIDER the relation:

        advisor = (advisor_id, advisor_lname, advisor_fname, advisor_phone)

        *   IF, say, given the value of advisor_id,
	    I can look up/obtain the value of advisor_phone,
	    then I'd say that advisor_phone is FUNCTIONALLY DEPENDENT on
	    advisor_id

        *   We also may say that advisor_id DETERMINES advisor_phone

*   I've seen it argued that the storage and retrieval of functional
    dependencies is arguably the only reason for having a database?!

*   a little notation:

    If I say that B is functionally dependent on A
        (or that A determines B) then that can be written:

    A -> B

    So, I could write:

    advisor_id -> advisor_phone

*   and, when you use the notation

    A -> B

    ...we say that A is the DETERMINANT in that functional dependency

    *   So, in

        advisor_id -> advisor_phone

        ...we say that advisor_id is the determinant in that functional
	dependency.

    *   note: you can have sets of attributes on either side or both sides of
        a functional dependency!

        (stu_id, class_id) -> class_grade

        advisor_id -> (advisor_lname, advisor_fname)

        (stu_id, class_id) -> (class_grade, completion_date)

    *   make sure this makes sense:

        if
	A -> (B, C),    it is also the case that A -> B and A -> C

        BUT!
	if
	(D, E) -> F,    it is NOT necessarily the case that
	                D -> F or that E -> F by itself

=====
some KEY definitions
=====
*   a SUPERKEY or LOGICAL KEY is any set of one or more attributes
    that uniquely determine a tuple in a relation

    *   a relation can have multiple superkeys!

        (it MUST have at least ONE such SET of attributes!)

    *   consider this relation:

        course_stats = (stu_id, class_id, class_grade, completion_date)

        it is the case that (stu_id, class_id) is a superkey of course_stats

        it is also the case the (stu_id, class_id, class_grade) is a superkey
	    of course_stats

    *   PLEASE NOTE: the functional dependencies and superkeys of
        relations representing some setting or scenario are determined
	by that scenario, and the users' model of that scenario;

        business rules - day-to-day rules in some setting or scenario

        ...such business rules can affect functional dependencies and
	superkeys;

        *   so if you have a question about whether some relationship
	    between attributes IS a functional dependency --
	    you should ask the users!!!

*   SO -- a superkey may be a bit "more" than is needed to determine a
    tuple in a relation;

    *   another definition:
        a MINIMAL key is a superkey in which NO (proper) SUBSET of the
	attributes is ALSO a superkey

        for advisor,
	(advisor_id, advisor_lname, advisor_fname) is a superkey but
	    not a minimal key, because subset (advisor_id) is also a superkey

        BUT (advisor_id) IS a minimal key, because no subset of
	    it is ALSO a superkey.

        for course_stats,
	(stu_id, class_id, class_grade) is a superkey but not a minimal
	    key, because subset (stu_id, class_id) is also a superkey

        BUT (stu_id, class_id) IS a minimal key, because no subset of
	    it is ALSO a superkey.

    *   a minimal key is also called a CANDIDATE key

*   well...
    a relation CAN have multiple minimal/candidate keys...!

    student = (stu_id, stu_ssn, stu_lname, stu_fname, stu_email)

    ...then (stu_id) is a minimal key
       but (stu_ssn) is also a minimal key

    *   the primary key for a relation is the minimal/candidate key
        that we CHOOSE (from amongst its possible minimal/candidate keys)
	to serve as its primary key;

*   one more point:

    do you see that, given these definitions, it MUST be the case that,
    for a relation scheme  (e.g.,  R(A1, A2, a3, a4, a5) )(

    the primary key attributes MUST functionally determine the remaining
    attributes?

    (that is,  (A1, A2) -> (a3, a4, a5) )

=====
ASIDE: the concept of physical keys/indexes
=====
*   some DBMSs support identifying certain attributes in a table
    and it will "add" support to more quickly access those attributes;

    these are NOT the same as primary keys! they are just for efficiency
    of certain searches...

=====
ONE more important point... for future use:
=====
*   we've just said that every primary key must be the determinant
    of a functional dependency involving that relation,
    such that the primary key determines the rest of the attributes in that
    relation;

    *   BUT -- that is NOT to say that EVERY determinant in a relation
        is a primary key!

    *   consider:
            student(STU_ID, stu_ssn, stu_lname, stu_fname, stu_email)

        (stu_ssn) -> (stu_id, stu_lname, stu_fname, stu_mail)

        ...but stu_ssn is not the candidate key I chose for student's
	   primary key.

    *   consider:
           ClassTaker(CLASSTAKER_ID, ctaker_lname, ctaker_fname,
	              ctaker_phone, advisor_id, advisor_lname, advisor_phone)

        Here, it IS the case that
	(classtaker_id) -> (ctaker_lname, ctaker_fname,
		            ctaker_phone, advisor_id,
			    advisor_lname, advisor_phone)

        BUT it is also the case that
	(advisor_id) -> (advisor_lname, advisor_phone)

        ...so in this case, advisor_id IS a determinant of a
	   functional dependency, BUT it is not the primary key
	   or even a candidate key for the relation ClassTaker

=====
*   what, then, is a FOREIGN key?

    (broader/theoretical definition)
    *   a foreign key is a set of attributes within one relation
        that is also a CANDIDATE key of another relation

    BUT that's hard for DBMSs to determine! SO in practice, many/most
    DBMSs require a foreign key to be a set of attributes within one
        relation that is also a PRIMARY key of another relation

    *   these let us relate the tuples/rows in different relations
        in USEFUL ways... 8-)

=====
*   some INTEGRITY definitions:

    *   referential integrity: when the foreign key attributes in
        a table are either null or also values in the referenced table

        ...then your data has referential integrity;

        ...DBMSs frequently do support referential integrity! Oracle does!

        ...you make a set of attributes/attributes a foreign key,
	   Oracle enforces referential integrity

    *   entity integrity: in a relation, no attribute of the primary key
        can be null
	AND no two rows can have the SAME value for the primary key

       ...then your data has entity integrity;

        ...DBMSs frequently do support entity integrity! Oracle does!

        ...you make a set of attributes/attributes a primary key,
	   Oracle enforces entity integrity

=====
some CLASSIC relational algebra operations
=====

*   in relational algebra, the relational operations manipulate
    relations to form new relations!

    and, the results of a relational operation on relations are also
    relations

    (and because relations are sets, set operations work on them also,
    but we'll get to most of those later)

*   these are just a useful subset of the relational algebra operations
    that there are:
    *   selection
    *   projection
    *   (Cartesian) product
    *   natural join
    *   equi-join

*   selection
    *   when you select JUST specified tuples/rows from a relation

*   projection
    *   when you select JUST specified attributes from a relation
        (and make sure the result has no duplicate rows)

*   (Cartesian) product
    *   the set of ALL pairs of ONE element from one set and ONE element from
        another set

        ...but when relations are involved, the sets are sets of tuples!

*   equi-join, natural join
    *    equi-join: when you take the Cartesian product of two
         relations,

         THEN you SELECT just the tuples in that product that meet
	 some condition (usually an attribute value in
	 one tuple being the same as an attribute value in another tuple from
	 another relation)

        *   MOST typically: you select the tuples in the Cartesian product
	    that have common values for the attribute(s) serving as the
	    foreign key relating those two tables

    *   natural join: the above, THEN you project ALL but one of the attributes/
        columns you joined on;

*   examples of styles/uses of SQL select statements that perform each
    of these basic relational operations are in 325lect04-2.sql,

    and discussed in SQL Reading Packet 2 - Writing Relational Operations
    using SQL