===== CS 325 - Week 13 Asynchronous Material - Part 1 ===== ===== In this part we will: ===== * announcements * start discussion Transaction Management - Part 1 * prep for next class? (or maybe that will be in a later part) * There ARE CS 325 labs on Friday, November 19th! * Remember that you Project DESIGN Draft Milestone is due by 11:59 pm on Friday, November 19th! * (Homework 10 will come out this weekend, and be due 11:59 pm on Friday, DECEMBER 3rd.) * PLEASE remember to register for Spring 2022 when registration window/enrollment appointment opens! ===== * Transaction Management and Concurrency Control - part 1 ===== * concept of a TRANSACTION * in database terms, we define this more specifically than you may have seen before: * starts with the concept of a transaction being a LOGICAL UNIT of WORK (luw) * the idea here: we'd like a transaction NOT to be "partially" done; logically, we would like it to be as IF it was one, indivisible unit, either completely completed or as if it were never started; * logical unit? ...the transaction might actually consist of numerous separate steps, but we want it in effect to be as if it were one single step; * a transaction can be a real-world event (significant in some scenario) that we want to be treated as a "single" logical piece; * a sale of an item * a deposit into an account * a registration for a course * a reservation for a plane seat * a sale of a concert ticket * etc.! * consider a transfer of $100 from a bank customer's savings accounts to their checking account: * might actually consist of a set of steps 1. read savings account balance (into a local variable or register) 2. reduce savings balance by 100 (in that local variable or register) 3. write the new savings balance back into the database 4. read the checking account balance (into a local variable or register) 5. increase the checking balance by 100 (in that local variable or register) 6. write the new checking balance back into the database What if, after the first 4 steps, something went wrong? (power outage, hard drive failure, network issue, etc.) ...customer would NOT be happy! They've lost $100 from their savings account! * SO: we'd like a transaction to be either COMPLETED (all its steps done), or completely not done (at least, be as if NONE of its steps were even started) * so, being more specific, a transaction could be defined as a logical unit of work that must be either entirely completed or entirely aborted; no intermediate/partial states are acceptable. or if you prefer: A transaction, also known as an atomic transaction or logical unit of work, is a series of actions (representing a single "logical" unit of work) taken on the database such that either ALL the actions are done successfully, or NONE are done and the database remains UNCHANGED. * Defining a series of stesp as a transaction is an important of concurrency control. * Also as it relates to transactions, we want to revisit another concept: CONSISTENCY * A consistent database state is one in which all data integrity constraints are satisfied. So, a transaction should change the contents of the database from one consistent state to another consistent state; The DBMS needs to do all it can to help support this. ===== ASIDE - some of the levels of DBMS support for data integrity ===== * remember at least some of these? * (and how DBMSs may support some of these more than others...) * entity integrity * if the DBMS requires unique primary keys for every relation (and enforces them, does not allow two rows to have the same value of that table's primary key, THAT is an example of enforcing entity integrity * interestingly, NOT permitting null values in any attribute making up a primary key is also considered as enforcing entity integrity! * domain integrity * if a DBMS allows you to specify that a column be from a particular set of values (from a domain, even if just a physical domain), and the DBMS won't allow a value for that column that is not from that domain, then it is considered as enforcing at least some degree of domain integrity * DBMSs can vary quite a bit in their support for domain integrity; * referential integrity * if a DBMS allows you specify that column in a table is a foreign key referencing the primary key in another table, and it enforces that, that DBMS is providing support for referential integrity e.g., you can't put in a row with a foreign key NOT in the referenced table; e.g., you can't update a row to have a foreign key that is not in the referenced table; e.g., you can't delete a row in the referenced table if there are rows in other tables referencing that row; * transaction integrity * a DBMS supports this if it supports enforcing atomicity of transactions -- it supports defining a logical set of steps as a transaction, and supports those steps as being ALL completed or as if they had never been started * We'll be discussiong some ways that SQL and DBMSs might help enforce transaction integrity ===== The five main database transaction properties: ACIDS ===== * there are four (or five, depending on the textbook...!) main database transaction properties that we'd like transactions to meet: * *A*tomicity * *C*onsistency * *I*solation * *D*urability * *S*erializability * Atomicity - what you think it is! requires that all parts of a transaction be completed; and if not, any partial actions be able to be "rolled back" such that the result is as if NONE of those steps were done * Consistency - a transaction must transform the database from one consistent state to another consistent state * so, before the transaction, the database is in a consistent state; * AFTER the transaction, the database has been transformed to a NEW consistent state (or the transaction's actions have to be rolled back... so the database is back to its previous consistent state) * Isolation - means the data used during the execution of a transaction cannot be used by a second transaction until the first one is completed (at least, the EFFECT must be as if this was the case) That is, transactions should execute independently of one another -- partial effects of incomplete transactions should not be visible to other transactions. ^ it is the responsibility of the concurrency contorl subsystem to ensure isolation * Durability - indicates the permanence of the database' consistent state. don't confuse this with the ACIDS property of consistency -- that consistency property means that a *transaction* *transforms* the database FROM one consistent state to ANOTHER state; DURABILITY means the consistent state of the database is maintained BETWEEN transactions; That is, the effects of committed transactions are permanently recorded so that between transactions they are not lost, even if power failures, fires, equipment malfunctions, etc. cause memory failures; If durability is supported by a DBMS, the DBMS provides facilities to recover the changes from committed transactions as needed in' the face of such issues * Serializability - you have this property when the concurrent execution of transactions is EQUIVALENT to any of the cases where the transactions executed SERIALLY in some arbitrary order. If I have transactions A, B, and C all done at the SAME time, interleaving their steps doesn't mess up their results - the results when completed are as if you had done ONE of the serial orderings of these (as if they had been done one after the other, rather than at the same time) the results are as if you did A then B then C OR B then A then C OR C then A then B OR C then B then A OR B then C then A OR A then C then B ===== how CAN SQL support transaction atomicity? ===== * ...well, the two statements: commit; rollback; commit; <-- intended to permanently record all changes since the previous commit of the database rollback; <-- "roll" back, or undo, all changes made since the previous commit (so the database is back at the state of the previous commit) * so the idea is: do a commit, do the actions of the transaction, ONLY when all the actions are successfully done do you do a final commit (to complete that transaction); if anything goes wrong along the way, a rollback should be done so it is as if that transaction had never even been started; ===== ONE example of a way to support DURABILITY: transaction logs ===== * a transaction log keeps track of the effects of all transactions that change database since the previous complete **backup** * before each change to the database, record that change to the log (in chronological order) IF a failure occurs, ... in a ROLLFORWARD approach, * you'd restore the database to its PREVIOUS backup state, then go through the transaction log, ONLY redoing, in order, the steps from transactions that had been COMMITTED at the time of the failure. (you DO NOT redo steps from ROLLED BACK or NOT-YET-COMMITTED transactions!) (after the next complete backup, you start a new transaction log) (BY the way, there can be all sorts of approaches here -- rollback: you instead, after a failure, just ROLL BACK the changes from transactions that had not yet been committed at the time of the failure.) ==== in part 2, we'll talk about just a few examples of algorithms for CONCURRENCY CONTROL from three of the common classes of such algorithms: * locks * time stamping * optimistic methods