315lect04-projected

Please send questions to st10@humboldt.edu .

CIS 315 - Week 4 Lecture
Intro to the Relational Model and Relational Operations

*   several reasons the relational model is important:
    *   it is the basis for an important category
        of DBMS products;
    *   can use it to express DBMS-independent database
        designs;
        ...because the constructs of the relational model
	   are broad and general;

Relational model:
*   a relational database is a collection of relations;

*   what is a relation?
    the more formal definition:
    a relation is a subset of the Cartesian product
    of a list of domains (Ulmann, 2nd edition, p. 19)

*   (section 1.1 - Sunderraman)
    a relation scheme (or schema) is a finite sequence
    of unique attribute names

    employees = (empl_id, empl_name, empl_addr, empl_salary)

    each attribute A is associated with a domain, dom(A),
    a set of values, which includes the special value
    null 
    *   for example: dom(empl_id) could be the set of
        integers between 1000 and 9999 and the special
	value null

    given a relation scheme R = (A1, A2, ... An)
    then a relation r on the scheme R is defined
    as any finite subset of the Cartesian product
    dom(A1) x dom(A2) x ... x dom(An)

    *   for example, for the employees scheme,
        any subset of combinations of an
        empl_id, an empl_name, and empl_addr, and an
        empl_salary is a relation ---
	
	so, for example:
	{(1111, 'Jones', '111 Ash St', 20000),
         (2222, 'Smith', '123 Elm St.', 25000)}

*   a relational *database* scheme D is
    a finite set of relation schemes {R1, ... Rm}

    ...and a relational database on a scheme D
       is a set of relations {r1, ... rm} where each
       ri is a relation on the corresponding scheme Ri

*   A relation, then, is not any old table --
    it is a table with restrictions that naturally
    "flow" from these definitions:
    *   each "cell" in a relation/table MUST be single-valued;
   
    *   duplicate rows are not allowed! (something --
        in this case, a row/tuple/record -- is either
	in this Cartesian produce subset, or it isn't)

    *   the order of the rows/tuples/records is not
        significant (because there isn't really a
	concept of ordering in a set)
 
    *   (the order of the attributes doesn't matter, either)

    *   each column/attribute/field must have a unique name;

    *   all of the entries in any column/attribute/field
        must be from the domain for that column/attribute/field
	(must be of the same "kind")

FUNCTIONAL DEPENDENCIES
*   a relationship between or among attributes

*   if, given the value of one attribute, A,
    you can look up/determine/obtain the value
    of another, B, then we say:

    B is functionally dependent on A

    *  example:
       for Advisor(ADVISOR_NUM, Advisor_lname, advisor_phone)
       given a particular Advisor_num, 
       you can look up/obtain the value of advisor_phone --

       advisor_phone is functionally-dependent on advisor_num

       The converse is not necessarily true --
       advisors may share a phone, so knowing a particular
       advisor_phone doesn't allow you to determine a
       unique advisor_num;

       That's another way to say this:
       if B is functionally dependent on A,
       then A determines B

*   functional dependencies are written using the following
    notation:

    A -> B
    Advisor_num -> Advisor_phone

    *   the attribute (or set of attributes) on
        the LHS of the arrow is called the DETERMINANT 

    *   in the above functional dependencies,
        A and Advisor_num are determinants;

*   Yes, functional dependencies can involve sets of
    attributes:

    (student_id, crn) -> sem_grade

*   self-check:

    If X -> (Y, Z), then X -> Y and X -> Z

    BUT! if (X, Y) -> Z, it is usually not true
        that X -> Z or Y -> Z

Keys!
*   a superkey (sometimes called a logical key)
    is a set of one or more attributes that uniquely identifies
    a tuple;

    ...ANY set of attributes that uniquely determines
    a row/tuple/record is a superkey (even it is has
    "more" than you need...)

    Consider:
    Course_Grades(COURSE_ID, STUDENT_ID, course_grade)

    *   (course_id, student_id) is a superkey for
        course_grades;
    *   (course_id) is NOT a superkey for course_grades;
        neither is (student_id)

    *   (course_id, student_id, course_grade)
        is also a superkey for course_grades;

*   note: EVERY relation MUST have a superkey!
    ... because no relation can have duplicate rows;
    (so, in the most extreme case,
    every relation has at least the set of all of
    its attributes as a superkey...)

*   getting more specific:
    a MINIMAL superkey is one in which NO subset
    of attributes is also a superkey;

    Such a minimal superkey is also called
    a CANDIDATE key;

*   for each relation, you SHOULD select (you typically
    have to identify) a PRIMARY key --
    the primary key should be chosen from amongst
    the candidate keys for a relation.
        
*  a FOREIGN KEY is a set of one or more attributes
   within one relation that is a candidate key
   of another relation
   (but note: some RDBMS's require that this
   be a primary key...)

   *  we'll discuss the particular reasons when
      foreign keys are inserted during our
      database design discussion;

   *  referential integrity:
      if the RDBMS ensures that if a foreign key has
      a value, it refers to any existing tuple in
      another relation, that RDBMS enforces
      referential integrity

entity integrity: 
...in a relation, no attribute of a primary key can be null,
   (nor, of course, can two tuples in a single relation
    can have the same primary key)

Relational Operations!

*  In relational algebra, the variables represent
   RELATIONS, not numbers;

*  there are relational operators that, applied
   to relations, result in relations;
   (these operators are CLOSED over the set of
   relations -- they result in relations);

   and this means you can reasonably COMPOSE
   these operators (use the result of one
   operation in another operation)

*  we'll discuss the most important subset of
   these operators today;
   a few more (the set-theoretic operations) later in
       the semester;
   and a few I'll leave to your reading if you want;

*  if you understand these, it will help you to
   write a broader set of queries...

*  set-theoretic (because relations are sets of tuples):
   union
   difference
   intersection
   Cartesian product

*  relation-theoretic 
   rename
   selection
   projection
   natural join/equi-join/other joins
   division

*   acc. to Kroenke (or was it Rob&Coronel?)
    for a DBMS to be considered even minimally
    relational, it must support at least
    selection, projection, and join

SELECTION
*   note: the SQL select statement is used
    to specify selection AND/OR projection
    AND/OR joins AND/OR etc.!

*   selection selects specified TUPLES/ROWS from
    relation;
    "horizontal" filter;

*   see selection example from posted relational intro
    examples;

PROJECTION
*   projection selects specified ATTRIBUTES/COLUMNS
    of a relation;
    "vertical" filter;

    BUT -- be careful;
    in "true" relational projection,
    the result is the set of tuples that result
    (e.g. NO duplicates are allowed in the result);

*   see projection example from posted relational intro
    examples;

CARTESIAN PRODUCT
*   ...needed to explain JOINs...!

*   concatenates every tuple from 1 relation with every
    tuple from another relation to forma third relation;

    *   a relation consisting of ALL possible pairs of
        rows from two relations;

*   see Cartesian product example from posted relational intro
    examples;

JOIN (right now, just EQUI-JOIN and NATURAL JOIN)
*  basically, you combine relations based on common
   attributes (for natural join and equi-join);

   *   combination of product, selection, and (for natural
       join) projection operations

   *   to take the EQUI-JOIN of two tables based on
       some common attribute:
       1. take the CARTESIAN PRODUCT of both tables
       2. take the selection from the Cartesian product
          of JUST those rows in which the common attributes
	  have the SAME value;
       then, for a natural join,
       3. take the projection of all of the attributes
          except for one instance of the common attribute
       
*   see equi-join/natural join example from posted 
    relational intro
    examples;