===== CS 325 - Week 3 Asynchronous material ===== ===== "TODAY" WE WILL ===== * announcements * More database fundamentals (DB Reading Packet 2) * (and a little beyond) * prep for next class Reading for this week: * DB Reading Packet 2 - More database fundamentals * Should be working on Homework 2, due 11:59 pm on Friday; ===== * from Kroenke: a slightly-more-formal database definition: "a database is a self-describing collection of integrated records" * self-describing: means the database contains METADATA as well as data -- that is, it also contains information about the structure of the data in that database; * this description of the data structure is called a data dictionary or a data directory * why would we want it to be self-describing? ...for one thing, it can be used by a program to determine what a database contains; ...that is, the self-describing aspect can help promote application program/data INdependence * integrated: you not only have data that's self-describing, you also have a description of RELATIONSHIPS among the data records; * foreign keys are one way to describe such relationships! ===== * NOTE that a database is a model, not of reality, but of the user's model or view of what's important in some scenario; * model - "simplified abstraction of real-world events or conditions" * the degree of detail provided by a database should be based on the needs and desires of the users involved in the scenario that database is modeling or representing; ===== * a LITTLE more database history: * those early 1960's databases tended to be "organizational" in scope -- large companies, an important subset of such a large company; * and these often were VERY oriented towards TRANSACTIONS - it can be as simple as a representation of a class of significant events * these early organizational databases were VERY good at keeping track of regular (predicatable, regularly-occurring) transactions and also good at creating regularly scheduled reports; * BUT! they were not very flexible, and application programs tended to need to be written in procedural languages such as COBOL and PL/I ...users might think of a "new" question the data COULD conceivably answer -- but they might not be able or willing to wait for a programmer to get around to writing a procedural program to answer their questions; ...enter THE RELATIONAL MODEL! ----- * E.F. Codd developed this in 1970; * based on relational algebra * instead of organizing a database using a hierarchical or network structure, this considers data to be stored in the form of RELATIONS, table-like structures that meet certain mathematical requirements; relations have rows, and relations have columns, and relationship between rows of tables are visible in the data * it turns out that relational algebra's relation operations provide a very useful way to express operations on relational tables that can map very nicely to QUESTIONS user might want to ask about the data; and asking and answering questions over time remains very reasonable! And you can do so with relatively short SQL select statements (compared to procuedural programs written in COBOL or PL/I or C++ or Java or JavaScript or Python, etc. * potential benefits of Codd's relational model: * in a well-deigned relational database, data are stored in a way that minimizes unnecessarily duplicated data and eliminates certain kinds of day-to-day processing errors that can occur compared to some other data-storage approaches; * columns can be used to contain data that relate one row to another row * it is easier for people to think about relational/tabular data than that is earlier hierarchical/network database approaches; * much better support for ad-hoc queries, spur-of-the-moment questions over time; * can encourage more-creative use of one's data; * there was initial resistance to the relational model (and there is a little now, also) * there IS overhead here! providing this relational abstraction and performing these relational operations; * fortunately, computer hardware and memory speed and power increased, and prices for these decreased, to where relational implementations could be practical; ====== Four main elements of a (relational) database: * user data * metadata * indexes of various kinds - especially foreign keys * application metadata * user data - the most obvious element! * in a relational database, user data is represented as relational tables; * we can depict these tables and/or their structure in quite a few ways -- here are THREE such ways: * tabular: actually list a table's rows and columns! * straightfoward! * can see contents! * you can see the basic table structure, BUT * you can infer domain information for column, but you don't know its physical domain * you might be able to infer how tables are related, but not explicit; * can be a pain to type; * there are times when you care more about the table structure and less about the current rows in a table; relation structure form can be useful! name_of_relation(PRIM_KEY_COL1, PRIM_KEY_COL2, another_col2, another_col2, ...) foreign key(another_col2) references another_relation * still pretty straightforward * nice and concise * pretty easy to type * CAN see the primary key * can often see the foreign key * don't see the rows in this form * don't see information about the column domains! * create-table-statement form: * represent a relation as the SQL create table statement you could use to create that table or relation! * definitely shows the columns * gives SOME physical domain information about each column * OUGHT to give primary key and foreign keys explicitly * can even create actual tables from them! * don't see rows in this form * this is a LITTLE more a pain to type... * maybe a little less easy for people to read than relation structure form... ===== aside: an example of poorly-structures vs. "better"-structured relations; * because some database structures will be harder to live with over time, or will be easier to maintain data integrity over time, or will make representing some information easier over time, etc. ===== going BEYOND the DB Reading Packet 2 a little bit: * a bit more about some capabilities a DBMS should provide -- these can vary a lot, BUT: if it is going to provide an interface between users/applications and the data, abstracting out physical storage details, then at the very least, it needs to provide: * some kind of data DEFINITION language (DDL) * some kind of data MANIPULATION language (DML) * some kind of data CONTROL language (DCL) * many DBMSs implement these by providing an implementation of SQL! (some also provide various GRAPHICAL DDL, DML and/or DCL) DDL - let users define the database! * for a relational database, this lets you define tables! * e.g. SQL's create table statement DML - lets users insert, update, delete, and retrieve data * e.g. SQL's insert statement and its update statement and its delete statement and its select statement DCL - lets users control access to the database * security features, data integrity features, concurrency control, recovery system, and more * SQL provides *some* DCL support - creation of views, ability to GRANT or REVOKE privileges on tables GRANT lets you say particular users can be GRANTed select, insert, update, and/or delete access to a particular table -- says users fg3, gh5, st10 can select, insert, and update, -- (but NOT delete) rows from user abc123's stuff table GRANT select, insert, update ON abc123.stuff TO fg3, gh5, st10; REVOKE lets you remove previously-GRANTed access -- says user gh5 can NO LONGER select rows from user abc123's -- stuff table (but they can still insert rows and update -- them...!) REVOKE select ON abc124.stuff FROM gh5;