=====
CS 325 - Week 4 Asynchronous Materials
=====
=====
"TODAY" WE WILL
=====
* announcements
* continue intro to the relational model
* intro to writing relational operations in SQL
* prep for next class
-----
Announcements
-----
* Current reading:
* DB Reading Packet 3, introducing the relational model
* SQL Reading Packet 2, writing relational operations using SQL
* Next homework -- Homework 3 -- will come out this weekend
=====
* relation scheme - a finite sequence of unique attribute names
* e.g.,
employee = (empl_id, empl_lname, empl_str_addr, empl_salary)
...this is not to far from our relation structure form...
employee(EMPL_ID, empl_lname, empl_str_addr, empl_salary)
* each attribute name A is associated with a domain, dom(A), a set of
value, which MAY include a special value null
* dom(empl_salary) could be said to be the set of real numbers and
the special value null,
could be said to be the set of possible salaries an employee
might be paid, or null for, say, volunteer employees who
must never be paid...?
* given a relation scheme
R = (A1, A2, ... An)
a relation r is defined as any finite subset of the Cartesian product
dom(A1) x dom(A2) ... dom(An)
* a sample relation for the employee relation scheme earlier could be
{ (1111, 'Jones', '111 Ash Str', 20000),
(2222, 'Nguyen', '123 Elm Str', 25000) }
* each element of this set is alled a tuple (row)
* a collection of such relations is a relational database...!
=====
* FUNCTIONAL DEPENDENCY
=====
* functional dependency - a relationship between or among attributes,
SUCH THAT, given the value of one attribute, A, you can look up/obtain
the value of another attribute, B
* if you can do that, then we say that B is functionally dependent
on A
...that A determines B
* CONSIDER the relation:
advisor = (advisor_id, advisor_lname, advisor_fname, advisor_phone)
* IF, say, given the value of advisor_id,
I can look up/obtain the value of advisor_phone,
then I'd say that advisor_phone is FUNCTIONALLY DEPENDENT on
advisor_id
* We also may say that advisor_id DETERMINES advisor_phone
* I've seen it argued that the storage and retrieval of functional
dependencies is arguably the only reason for having a database?!
* a little notation:
If I say that B is functionally dependent on A
(or that A determines B) then that can be written:
A -> B
So, I could write:
advisor_id -> advisor_phone
* and, when you use the notation
A -> B
...we say that A is the DETERMINANT in that functional dependency
* So, in
advisor_id -> advisor_phone
...we say that advisor_id is the determinant in that functional
dependency.
* note: you can have sets of attributes on either side or both sides of
a functional dependency!
(stu_id, class_id) -> class_grade
advisor_id -> (advisor_lname, advisor_fname)
(stu_id, class_id) -> (class_grade, completion_date)
* make sure this makes sense:
if
A -> (B, C), it is also the case that A -> B and A -> C
BUT!
if
(D, E) -> F, it is NOT necessarily the case that
D -> F or that E -> F by itself
=====
some KEY definitions
=====
* a SUPERKEY or LOGICAL KEY is any set of one or more attributes
that uniquely determine a tuple in a relation
* a relation can have multiple superkeys!
(it MUST have at least ONE such SET of attributes!)
* consider this relation:
course_stats = (stu_id, class_id, class_grade, completion_date)
it is the case that (stu_id, class_id) is a superkey of course_stats
it is also the case the (stu_id, class_id, class_grade) is a superkey
of course_stats
* PLEASE NOTE: the functional dependencies and superkeys of
relations representing some setting or scenario are determined
by that scenario, and the users' model of that scenario;
business rules - day-to-day rules in some setting or scenario
...such business rules can affect functional dependencies and
superkeys;
* so if you have a question about whether some relationship
between attributes IS a functional dependency --
you should ask the users!!!
* SO -- a superkey may be a bit "more" than is needed to determine a
tuple in a relation;
* another definition:
a MINIMAL key is a superkey in which NO (proper) SUBSET of the
attributes is ALSO a superkey
for advisor,
(advisor_id, advisor_lname, advisor_fname) is a superkey but
not a minimal key, because subset (advisor_id) is also a superkey
BUT (advisor_id) IS a minimal key, because no subset of
it is ALSO a superkey.
for course_stats,
(stu_id, class_id, class_grade) is a superkey but not a minimal
key, because subset (stu_id, class_id) is also a superkey
BUT (stu_id, class_id) IS a minimal key, because no subset of
it is ALSO a superkey.
* a minimal key is also called a CANDIDATE key
* well...
a relation CAN have multiple minimal/candidate keys...!
student = (stu_id, stu_ssn, stu_lname, stu_fname, stu_email)
...then (stu_id) is a minimal key
but (stu_ssn) is also a minimal key
* the primary key for a relation is the minimal/candidate key
that we CHOOSE (from amongst its possible minimal/candidate keys)
to serve as its primary key;
* one more point:
do you see that, given these definitions, it MUST be the case that,
for a relation scheme (e.g., R(A1, A2, a3, a4, a5) )(
the primary key attributes MUST functionally determine the remaining
attributes?
(that is, (A1, A2) -> (a3, a4, a5) )
=====
ASIDE: the concept of physical keys/indexes
=====
* some DBMSs support identifying certain attributes in a table
and it will "add" support to more quickly access those attributes;
these are NOT the same as primary keys! they are just for efficiency
of certain searches...
=====
ONE more important point... for future use:
=====
* we've just said that every primary key must be the determinant
of a functional dependency involving that relation,
such that the primary key determines the rest of the attributes in that
relation;
* BUT -- that is NOT to say that EVERY determinant in a relation
is a primary key!
* consider:
student(STU_ID, stu_ssn, stu_lname, stu_fname, stu_email)
(stu_ssn) -> (stu_id, stu_lname, stu_fname, stu_mail)
...but stu_ssn is not the candidate key I chose for student's
primary key.
* consider:
ClassTaker(CLASSTAKER_ID, ctaker_lname, ctaker_fname,
ctaker_phone, advisor_id, advisor_lname, advisor_phone)
Here, it IS the case that
(classtaker_id) -> (ctaker_lname, ctaker_fname,
ctaker_phone, advisor_id,
advisor_lname, advisor_phone)
BUT it is also the case that
(advisor_id) -> (advisor_lname, advisor_phone)
...so in this case, advisor_id IS a determinant of a
functional dependency, BUT it is not the primary key
or even a candidate key for the relation ClassTaker
=====
* what, then, is a FOREIGN key?
(broader/theoretical definition)
* a foreign key is a set of attributes within one relation
that is also a CANDIDATE key of another relation
BUT that's hard for DBMSs to determine! SO in practice, many/most
DBMSs require a foreign key to be a set of attributes within one
relation that is also a PRIMARY key of another relation
* these let us relate the tuples/rows in different relations
in USEFUL ways... 8-)
=====
* some INTEGRITY definitions:
* referential integrity: when the foreign key attributes in
a table are either null or also values in the referenced table
...then your data has referential integrity;
...DBMSs frequently do support referential integrity! Oracle does!
...you make a set of attributes/attributes a foreign key,
Oracle enforces referential integrity
* entity integrity: in a relation, no attribute of the primary key
can be null
AND no two rows can have the SAME value for the primary key
...then your data has entity integrity;
...DBMSs frequently do support entity integrity! Oracle does!
...you make a set of attributes/attributes a primary key,
Oracle enforces entity integrity
=====
some CLASSIC relational algebra operations
=====
* in relational algebra, the relational operations manipulate
relations to form new relations!
and, the results of a relational operation on relations are also
relations
(and because relations are sets, set operations work on them also,
but we'll get to most of those later)
* these are just a useful subset of the relational algebra operations
that there are:
* selection
* projection
* (Cartesian) product
* natural join
* equi-join
* selection
* when you select JUST specified tuples/rows from a relation
* projection
* when you select JUST specified attributes from a relation
(and make sure the result has no duplicate rows)
* (Cartesian) product
* the set of ALL pairs of ONE element from one set and ONE element from
another set
...but when relations are involved, the sets are sets of tuples!
* equi-join, natural join
* equi-join: when you take the Cartesian product of two
relations,
THEN you SELECT just the tuples in that product that meet
some condition (usually an attribute value in
one tuple being the same as an attribute value in another tuple from
another relation)
* MOST typically: you select the tuples in the Cartesian product
that have common values for the attribute(s) serving as the
foreign key relating those two tables
* natural join: the above, THEN you project ALL but one of the attributes/
columns you joined on;
* examples of styles/uses of SQL select statements that perform each
of these basic relational operations are in 325lect04-2.sql,
and discussed in SQL Reading Packet 2 - Writing Relational Operations
using SQL