Please send questions to
st10@humboldt.edu .
CIS 315 - Week 4 Lecture
Intro to the Relational Model and Relational Operations
* several reasons the relational model is important:
* it is the basis for an important category
of DBMS products;
* can use it to express DBMS-independent database
designs;
...because the constructs of the relational model
are broad and general;
Relational model:
* a relational database is a collection of relations;
* what is a relation?
the more formal definition:
a relation is a subset of the Cartesian product
of a list of domains (Ulmann, 2nd edition, p. 19)
* (section 1.1 - Sunderraman)
a relation scheme (or schema) is a finite sequence
of unique attribute names
employees = (empl_id, empl_name, empl_addr, empl_salary)
each attribute A is associated with a domain, dom(A),
a set of values, which includes the special value
null
* for example: dom(empl_id) could be the set of
integers between 1000 and 9999 and the special
value null
given a relation scheme R = (A1, A2, ... An)
then a relation r on the scheme R is defined
as any finite subset of the Cartesian product
dom(A1) x dom(A2) x ... x dom(An)
* for example, for the employees scheme,
any subset of combinations of an
empl_id, an empl_name, and empl_addr, and an
empl_salary is a relation ---
so, for example:
{(1111, 'Jones', '111 Ash St', 20000),
(2222, 'Smith', '123 Elm St.', 25000)}
* a relational *database* scheme D is
a finite set of relation schemes {R1, ... Rm}
...and a relational database on a scheme D
is a set of relations {r1, ... rm} where each
ri is a relation on the corresponding scheme Ri
* A relation, then, is not any old table --
it is a table with restrictions that naturally
"flow" from these definitions:
* each "cell" in a relation/table MUST be single-valued;
* duplicate rows are not allowed! (something --
in this case, a row/tuple/record -- is either
in this Cartesian produce subset, or it isn't)
* the order of the rows/tuples/records is not
significant (because there isn't really a
concept of ordering in a set)
* (the order of the attributes doesn't matter, either)
* each column/attribute/field must have a unique name;
* all of the entries in any column/attribute/field
must be from the domain for that column/attribute/field
(must be of the same "kind")
FUNCTIONAL DEPENDENCIES
* a relationship between or among attributes
* if, given the value of one attribute, A,
you can look up/determine/obtain the value
of another, B, then we say:
B is functionally dependent on A
* example:
for Advisor(ADVISOR_NUM, Advisor_lname, advisor_phone)
given a particular Advisor_num,
you can look up/obtain the value of advisor_phone --
advisor_phone is functionally-dependent on advisor_num
The converse is not necessarily true --
advisors may share a phone, so knowing a particular
advisor_phone doesn't allow you to determine a
unique advisor_num;
That's another way to say this:
if B is functionally dependent on A,
then A determines B
* functional dependencies are written using the following
notation:
A -> B
Advisor_num -> Advisor_phone
* the attribute (or set of attributes) on
the LHS of the arrow is called the DETERMINANT
* in the above functional dependencies,
A and Advisor_num are determinants;
* Yes, functional dependencies can involve sets of
attributes:
(student_id, crn) -> sem_grade
* self-check:
If X -> (Y, Z), then X -> Y and X -> Z
BUT! if (X, Y) -> Z, it is usually not true
that X -> Z or Y -> Z
Keys!
* a superkey (sometimes called a logical key)
is a set of one or more attributes that uniquely identifies
a tuple;
...ANY set of attributes that uniquely determines
a row/tuple/record is a superkey (even it is has
"more" than you need...)
Consider:
Course_Grades(COURSE_ID, STUDENT_ID, course_grade)
* (course_id, student_id) is a superkey for
course_grades;
* (course_id) is NOT a superkey for course_grades;
neither is (student_id)
* (course_id, student_id, course_grade)
is also a superkey for course_grades;
* note: EVERY relation MUST have a superkey!
... because no relation can have duplicate rows;
(so, in the most extreme case,
every relation has at least the set of all of
its attributes as a superkey...)
* getting more specific:
a MINIMAL superkey is one in which NO subset
of attributes is also a superkey;
Such a minimal superkey is also called
a CANDIDATE key;
* for each relation, you SHOULD select (you typically
have to identify) a PRIMARY key --
the primary key should be chosen from amongst
the candidate keys for a relation.
* a FOREIGN KEY is a set of one or more attributes
within one relation that is a candidate key
of another relation
(but note: some RDBMS's require that this
be a primary key...)
* we'll discuss the particular reasons when
foreign keys are inserted during our
database design discussion;
* referential integrity:
if the RDBMS ensures that if a foreign key has
a value, it refers to any existing tuple in
another relation, that RDBMS enforces
referential integrity
entity integrity:
...in a relation, no attribute of a primary key can be null,
(nor, of course, can two tuples in a single relation
can have the same primary key)
Relational Operations!
* In relational algebra, the variables represent
RELATIONS, not numbers;
* there are relational operators that, applied
to relations, result in relations;
(these operators are CLOSED over the set of
relations -- they result in relations);
and this means you can reasonably COMPOSE
these operators (use the result of one
operation in another operation)
* we'll discuss the most important subset of
these operators today;
a few more (the set-theoretic operations) later in
the semester;
and a few I'll leave to your reading if you want;
* if you understand these, it will help you to
write a broader set of queries...
* set-theoretic (because relations are sets of tuples):
union
difference
intersection
Cartesian product
* relation-theoretic
rename
selection
projection
natural join/equi-join/other joins
division
* acc. to Kroenke (or was it Rob&Coronel?)
for a DBMS to be considered even minimally
relational, it must support at least
selection, projection, and join
SELECTION
* note: the SQL select statement is used
to specify selection AND/OR projection
AND/OR joins AND/OR etc.!
* selection selects specified TUPLES/ROWS from
relation;
"horizontal" filter;
* see selection example from posted relational intro
examples;
PROJECTION
* projection selects specified ATTRIBUTES/COLUMNS
of a relation;
"vertical" filter;
BUT -- be careful;
in "true" relational projection,
the result is the set of tuples that result
(e.g. NO duplicates are allowed in the result);
* see projection example from posted relational intro
examples;
CARTESIAN PRODUCT
* ...needed to explain JOINs...!
* concatenates every tuple from 1 relation with every
tuple from another relation to forma third relation;
* a relation consisting of ALL possible pairs of
rows from two relations;
* see Cartesian product example from posted relational intro
examples;
JOIN (right now, just EQUI-JOIN and NATURAL JOIN)
* basically, you combine relations based on common
attributes (for natural join and equi-join);
* combination of product, selection, and (for natural
join) projection operations
* to take the EQUI-JOIN of two tables based on
some common attribute:
1. take the CARTESIAN PRODUCT of both tables
2. take the selection from the Cartesian product
of JUST those rows in which the common attributes
have the SAME value;
then, for a natural join,
3. take the projection of all of the attributes
except for one instance of the common attribute
* see equi-join/natural join example from posted
relational intro
examples;