315lect13-projected

Please send questions to st10@humboldt.edu .
* what can go wrong when two concurrent transactions
  try to access the same database data at the same time?
  * (not a comprehensive list...)

  * lost updates
  * uncommitted data (dirty read, uncommitted dependency)
  * inconsistent retrieval

* typically, the DBMS is in charge of controlling concurrency
  when multiple transactions can be executed concurrently;

* three classic classes of algorithms for concurrency
  control are:
  *   locks
  *   time stamping/timestamps
  *   optimistic methods

* Locks
  * basic idea: prevent those inconsistencies just mentioned
    by preventing multiple transactions from having copies
    of the the same data when that data is about to be
    changed;

    a transaction must request and obtain a lock on
    the data, before being able to read or write it;
    once obtained, and the action done, the transaction
    releases the lock...

*   implicit locks: placed by the DBMS
    explicit locks: issued by the application program
        or transaction;
    ...in this class, we're assuming implicit locks;

*   granularity: how much is locked by a lock?

    *   can you see that the larger the granularity,
        the less the overhead for locking (easier for
	the DBMS), but also the less potential concurrency;

    *   and the smaller the granularity, the more the
        overhead for locking, but also the more potential
	concurrency;

first classic kind of lock: binary locks
* an object is locked or unlocked (1 or 0, true or false)
* if an object is locked, only the transaction with the
  lock can do anything with that object; (reading, writing,
  etc.)
  *   if the object is unlocked, a transaction can obtain 
      a lock on that object;
  *   if the object is locked, a transaction wanting to
      obtain a lock on it must wait;
  *   reasonably simple to implement -- but also
      pretty restrictive (concurrent reads aren't allowed)

* a second classic locking approach:
  read/write, or exclusive/shared, locks;

  *   a lock has 3 states: unlocked, read (share) locked,
      write (exclusive) locked;

  *   if an object is read-locked by a transaction,
      other transactions can also obtain read-locks on
      that object;
  *   if an object is write-locked by a transaction,
      other transactions may NOT obtain ANY lock
      (read or write) on that object;

  transaction T wants a read-lock on object O
  *   IF object O is not locked: T gets the read-lock
  *   else if oject O has a read-lock on it currently:
      T can also get a read-lock
  *   else if object O has a write-lock on it currently:
      T has to wait;

  transaction T wants a write-lock on object O
  *   IF object O is not locked: T gets the write-lock
  *   else if object O has a read-lock:
      T has to wait
  *   else if object O has a write-lock:
      T has to wait

* one issue: locking, as described thus far,
  helps to achieve isolation,
  but does not guarantee serializability;

  * there are several to achieve this -- one
    way is to add a protocol such as
    two-phased locking
    atop the locking approach;

    *   basic idea: a transaction can request and obtain 
        whatever locks it needs, but once it releases
	a lock, it can obtain NO MORE locks;
	...why two-phased? because it results
	   in a transaction having a so-called "growing
	   phase", in which locks can be obtained,
	   and a so-called shrinking phase, in which
	   locks are released;
        *  it can be shown that this can help ensure
	   serializibility with locking;

        (IBM's DB2 had (maybe still has?) a more-restrictive
	(easy to implement!) version of this:
	...it just doesn't release any of a transaction's
	locks until commit or rollback is done...!)

* with locks, you can have DEADLOCK --
  for example, a pair of transactions each waiting for 
  something
  locked by the other; 
  *   MANY approached to deal with this;
      timeouts,
      detection (let it occur, detect it, and break it),
      prevention (prevent it from occurring at all)
      ignore it...!

* timestamp - an alternative to locking for concurrency control
  timestamp: a monotonic counter (just gets bigger and bigger
  *   each transaction gets a unique timestamp when it
      begins
  *   all actions of a transaction are considered to have
      the timestamp of that transaction
  *   the DBMS then has the responsiblity to make sure
      that conflicting operations are done in
      timestamp order;
      ...thus ensuring serializability;

  *   if Ti issues a read(Q):
      *   if TS(Ti) < W-ts(Q),
          then Ti is seeking to read Q but it has been
          overwritten by a "future" transaction (according
          to timestamp ordering),
	  read will be rejected, Ti will be rolled back
	  (and restarted with a new timestamp)
      *   if TS(Ti) >= W-ts(Q),
          it is "safe" to execute this read,
	  and R-ts(Q) will be set to the max(TS(Ti), R-ts(Q))

    * if Ti issues a write(Q),
      *   if TS(Ti) < R-ts(Q), 
          then a "later" transaction already read this Q --
          write is rejected, Ti is rolled back and restarted
      *   else if TS(Ti) < W-ts(Q),
          then a "later" transaction wrote this Q --
          write is rejected, Ti is rolled back and restarted
      *   else, the write it accepted,
          and W-ts(Q) is updated to the TS(Ti)

*  note the higher overhead here: 2 timestamps for each
   data chunk!

*  optimistic methods - based on the assumption
   that most database operations do not conflict;
   ...see the PDF version of the notes for
      the short description of this;