=====
CS 325 - Week 3 Asynchronous material
=====
=====
"TODAY" WE WILL
=====
* announcements
* More database fundamentals (DB Reading Packet 2)
* (and a little beyond)
* prep for next class
Reading for this week:
* DB Reading Packet 2 - More database fundamentals
* Should be working on Homework 2, due 11:59 pm on Friday;
=====
* from Kroenke:
a slightly-more-formal database definition:
"a database is a self-describing collection of integrated records"
* self-describing: means the database contains METADATA as well as
data --
that is, it also contains information about the structure
of the data in that database;
* this description of the data structure is called a
data dictionary or a data directory
* why would we want it to be self-describing?
...for one thing, it can be used by a program to determine
what a database contains;
...that is, the self-describing aspect can help promote
application program/data INdependence
* integrated: you not only have data that's self-describing,
you also have a description of RELATIONSHIPS among the data
records;
* foreign keys are one way to describe such relationships!
=====
* NOTE that a database is a model, not of reality, but of the
user's model or view of what's important in some scenario;
* model - "simplified abstraction of real-world events or
conditions"
* the degree of detail provided by a database should be based
on the needs and desires of the users involved in the
scenario that database is modeling or representing;
=====
* a LITTLE more database history:
* those early 1960's databases tended to be "organizational" in scope --
large companies, an important subset of such a large company;
* and these often were VERY oriented towards TRANSACTIONS -
it can be as simple as a representation of a class of
significant events
* these early organizational databases were VERY good at
keeping track of regular (predicatable, regularly-occurring)
transactions
and also good at creating regularly scheduled reports;
* BUT! they were not very flexible,
and application programs tended to need to be written
in procedural languages such as COBOL and PL/I
...users might think of a "new" question the data COULD
conceivably answer -- but they might not be able or
willing to wait for a programmer to get around to
writing a procedural program to answer their questions;
...enter THE RELATIONAL MODEL!
-----
* E.F. Codd developed this in 1970;
* based on relational algebra
* instead of organizing a database using a hierarchical or
network structure,
this considers data to be stored in the form of RELATIONS,
table-like structures that meet certain mathematical requirements;
relations have rows, and relations have columns,
and relationship between rows of tables are visible in the
data
* it turns out that relational algebra's relation operations
provide a very useful way to express operations on relational
tables that can map very nicely to QUESTIONS user might want
to ask about the data;
and asking and answering questions over time remains very
reasonable! And you can do so with relatively short SQL select
statements (compared to procuedural programs written in
COBOL or PL/I or C++ or Java or JavaScript or Python, etc.
* potential benefits of Codd's relational model:
* in a well-deigned relational database, data are stored
in a way that minimizes unnecessarily duplicated data
and eliminates certain kinds of day-to-day processing
errors that can occur compared to some other data-storage
approaches;
* columns can be used to contain data that relate one row
to another row
* it is easier for people to think about relational/tabular
data than that is earlier hierarchical/network database
approaches;
* much better support for ad-hoc queries, spur-of-the-moment
questions over time;
* can encourage more-creative use of one's data;
* there was initial resistance to the relational model
(and there is a little now, also)
* there IS overhead here! providing this relational abstraction
and performing these relational operations;
* fortunately, computer hardware and memory speed and power
increased, and prices for these decreased, to where
relational implementations could be practical;
======
Four main elements of a (relational) database:
* user data
* metadata
* indexes of various kinds - especially foreign keys
* application metadata
* user data - the most obvious element!
* in a relational database, user data is represented as
relational tables;
* we can depict these tables and/or their structure
in quite a few ways -- here are THREE such ways:
* tabular: actually list a table's rows and columns!
* straightfoward!
* can see contents!
* you can see the basic table structure,
BUT
* you can infer domain information for column,
but you don't know its physical domain
* you might be able to infer how tables are related,
but not explicit;
* can be a pain to type;
* there are times when you care more about the table
structure and less about the current rows in a table;
relation structure form can be useful!
name_of_relation(PRIM_KEY_COL1, PRIM_KEY_COL2, another_col2,
another_col2, ...)
foreign key(another_col2) references another_relation
* still pretty straightforward
* nice and concise
* pretty easy to type
* CAN see the primary key
* can often see the foreign key
* don't see the rows in this form
* don't see information about the column domains!
* create-table-statement form:
* represent a relation as the SQL create table statement
you could use to create that table or relation!
* definitely shows the columns
* gives SOME physical domain information about each
column
* OUGHT to give primary key and foreign keys
explicitly
* can even create actual tables from them!
* don't see rows in this form
* this is a LITTLE more a pain to type...
* maybe a little less easy for people to read
than relation structure form...
=====
aside: an example of poorly-structures vs. "better"-structured
relations;
* because some database structures will be harder to live with
over time, or will be easier to maintain data integrity over time,
or will make representing some information easier over time,
etc.
=====
going BEYOND the DB Reading Packet 2 a little bit:
* a bit more about some capabilities a DBMS should provide --
these can vary a lot, BUT:
if it is going to provide an interface between users/applications
and the data,
abstracting out physical storage details,
then at the very least, it needs to provide:
* some kind of data DEFINITION language (DDL)
* some kind of data MANIPULATION language (DML)
* some kind of data CONTROL language (DCL)
* many DBMSs implement these by providing an implementation
of SQL! (some also provide various GRAPHICAL DDL, DML and/or DCL)
DDL - let users define the database!
* for a relational database, this lets you define tables!
* e.g. SQL's create table statement
DML - lets users insert, update, delete, and retrieve data
* e.g. SQL's insert statement
and its update statement
and its delete statement
and its select statement
DCL - lets users control access to the database
* security features, data integrity features,
concurrency control, recovery system, and more
* SQL provides *some* DCL support -
creation of views,
ability to GRANT or REVOKE privileges on tables
GRANT lets you say particular users can be GRANTed select,
insert, update, and/or delete access to a particular
table
-- says users fg3, gh5, st10 can select, insert, and update,
-- (but NOT delete) rows from user abc123's stuff table
GRANT select, insert, update
ON abc123.stuff
TO fg3, gh5, st10;
REVOKE lets you remove previously-GRANTed access
-- says user gh5 can NO LONGER select rows from user abc123's
-- stuff table (but they can still insert rows and update
-- them...!)
REVOKE select
ON abc124.stuff
FROM gh5;