325lect03-2-projected


=====
CS 325 - Week 3 Asynchronous material
=====

=====
"TODAY" WE WILL
=====
*   announcements
*   More database fundamentals (DB Reading Packet 2)
*   (and a little beyond)
*   prep for next class

Reading for this week:
*   DB Reading Packet 2 - More database fundamentals

*   Should be working on Homework 2, due 11:59 pm on Friday;

=====
*   from Kroenke:
    a slightly-more-formal database definition:

    "a database is a self-describing collection of integrated records"

    *   self-describing: means the database contains METADATA as well as
        data --

	that is, it also contains information about the structure
	of the data in that database;

        *   this description of the data structure is called a
	    data dictionary or a data directory

        *   why would we want it to be self-describing?
	    ...for one thing, it can be used by a program to determine
	       what a database contains;

            ...that is, the self-describing aspect can help promote
	       application program/data INdependence

    *   integrated: you not only have data that's self-describing,
        you also have a description of RELATIONSHIPS among the data
	records;

        *   foreign keys are one way to describe such relationships!

=====
*   NOTE that a database is a model, not of reality, but of the
    user's model or view of what's important in some scenario;

    *   model - "simplified abstraction of real-world events or
        conditions"

    *   the degree of detail provided by a database should be based
        on the needs and desires of the users involved in the
	scenario that database is modeling or representing;

=====
*   a LITTLE more database history:

*   those early 1960's databases tended to be "organizational" in scope --
    large companies, an important subset of such a large company;

    *   and these often were VERY oriented towards TRANSACTIONS -
        it can be as simple as a representation of a class of
	significant events

    *   these early organizational databases were VERY good at
        keeping track of regular (predicatable, regularly-occurring)
	transactions
	and also good at creating regularly scheduled reports;

        *   BUT! they were not very flexible,
	    and application programs tended to need to be written
	       in procedural languages such as COBOL and PL/I

            ...users might think of a "new" question the data COULD
	       conceivably answer -- but they might not be able or
	       willing to wait for a programmer to get around to
	       writing a procedural program to answer their questions;

...enter THE RELATIONAL MODEL!
-----
*   E.F. Codd developed this in 1970;

    *   based on relational algebra

    *   instead of organizing a database using a hierarchical or
        network structure,

        this considers data to be stored in the form of RELATIONS,
	table-like structures that meet certain mathematical requirements;

        relations have rows, and relations have columns,
	and relationship between rows of tables are visible in the
	data

    *   it turns out that relational algebra's relation operations
        provide a very useful way to express operations on relational
	tables that can map very nicely to QUESTIONS user might want
	to ask about the data;

        and asking and answering questions over time remains very
	reasonable! And you can do so with relatively short SQL select
	statements (compared to procuedural programs written in
	COBOL or PL/I or C++ or Java or JavaScript or Python, etc.

*   potential benefits of Codd's relational model:
    *   in a well-deigned relational database, data are stored
        in a way that minimizes unnecessarily duplicated data
	and eliminates certain kinds of day-to-day processing
	errors that can occur compared to some other data-storage
	approaches;

    *   columns can be used to contain data that relate one row
        to another row

    *   it is easier for people to think about relational/tabular
        data than that is earlier hierarchical/network database
	approaches;

    *   much better support for ad-hoc queries, spur-of-the-moment
        questions over time;
	*   can encourage more-creative use of one's data;

*   there was initial resistance to the relational model
    (and there is a little now, also)
    *   there IS overhead here! providing this relational abstraction
        and performing these relational operations;

    *   fortunately, computer hardware and memory speed and power
        increased, and prices for these decreased, to where
	relational implementations could be practical;

======
Four main elements of a (relational) database:

*   user data
*   metadata
*   indexes of various kinds - especially foreign keys
*   application metadata

*   user data - the most obvious element!

    *   in a relational database, user data is represented as
        relational tables;

    *   we can depict these tables and/or their structure
        in quite a few ways -- here are THREE such ways:

        *   tabular: actually list a table's rows and columns!

            *   straightfoward!
	    *   can see contents!
	    *   you can see the basic table structure,
	    BUT
	    *   you can infer domain information for column,
	        but you don't know its physical domain

            *   you might be able to infer how tables are related,
	        but not explicit;

            *   can be a pain to type;

       *    there are times when you care more about the table
            structure and less about the current rows in a table;

	    relation structure form can be useful!

            name_of_relation(PRIM_KEY_COL1, PRIM_KEY_COL2, another_col2,
	        another_col2, ...)
		foreign key(another_col2) references another_relation

            *   still pretty straightforward
	    *   nice and concise
	    *   pretty easy to type
	    *   CAN see the primary key
	    *   can often see the foreign key

            *   don't see the rows in this form
	    *   don't see information about the column domains!

        *   create-table-statement form:
	    *   represent a relation as the SQL create table statement
	        you could use to create that table or relation!

            *   definitely shows the columns
	    *   gives SOME physical domain information about each
	        column
	    *   OUGHT to give primary key and foreign keys
	        explicitly
            *   can even create actual tables from them!

            *   don't see rows in this form
	    *   this is a LITTLE more a pain to type...
	    *   maybe a little less easy for people to read
	        than relation structure form...

=====
aside: an example of poorly-structures vs. "better"-structured
   relations;

*   because some database structures will be harder to live with
    over time, or will be easier to maintain data integrity over time,
    or will make representing some information easier over time,
    etc.

=====
going BEYOND the DB Reading Packet 2 a little bit:

*   a bit more about some capabilities a DBMS should provide --
    these can vary a lot, BUT:

    if it is going to provide an interface between users/applications
    and the data,
    abstracting out physical storage details,

    then at the very least, it needs to provide:
    *   some kind of data DEFINITION language (DDL)
    *   some kind of data MANIPULATION language (DML)
    *   some kind of data CONTROL language (DCL)

    *   many DBMSs implement these by providing an implementation
        of SQL! (some also provide various GRAPHICAL DDL, DML and/or DCL)

    DDL - let users define the database!
    *   for a relational database, this lets you define tables!

    *   e.g. SQL's create table statement

    DML - lets users insert, update, delete, and retrieve data
    *   e.g. SQL's insert statement
        and its update statement
	and its delete statement
	and its select statement

    DCL - lets users control access to the database
    *   security features, data integrity features,
        concurrency control, recovery system, and more

    *   SQL provides *some* DCL support -
        creation of views,
	ability to GRANT or REVOKE privileges on tables

        GRANT lets you say particular users can be GRANTed select,
	    insert, update, and/or delete access to a particular
	    table

        -- says users fg3, gh5, st10 can select, insert, and update,
	--     (but NOT delete) rows from user abc123's stuff table

        GRANT select, insert, update
	ON abc123.stuff
	TO fg3, gh5, st10;

        REVOKE lets you remove previously-GRANTed access

        -- says user gh5 can NO LONGER select rows from user abc123's
	--     stuff table (but they can still insert rows and update
	--     them...!)

        REVOKE select
	ON abc124.stuff
	FROM gh5;