Please send questions to
st10@humboldt.edu .
* what can go wrong when two concurrent transactions
try to access the same database data at the same time?
* (not a comprehensive list...)
* lost updates
* uncommitted data (dirty read, uncommitted dependency)
* inconsistent retrieval
* typically, the DBMS is in charge of controlling concurrency
when multiple transactions can be executed concurrently;
* three classic classes of algorithms for concurrency
control are:
* locks
* time stamping/timestamps
* optimistic methods
* Locks
* basic idea: prevent those inconsistencies just mentioned
by preventing multiple transactions from having copies
of the the same data when that data is about to be
changed;
a transaction must request and obtain a lock on
the data, before being able to read or write it;
once obtained, and the action done, the transaction
releases the lock...
* implicit locks: placed by the DBMS
explicit locks: issued by the application program
or transaction;
...in this class, we're assuming implicit locks;
* granularity: how much is locked by a lock?
* can you see that the larger the granularity,
the less the overhead for locking (easier for
the DBMS), but also the less potential concurrency;
* and the smaller the granularity, the more the
overhead for locking, but also the more potential
concurrency;
first classic kind of lock: binary locks
* an object is locked or unlocked (1 or 0, true or false)
* if an object is locked, only the transaction with the
lock can do anything with that object; (reading, writing,
etc.)
* if the object is unlocked, a transaction can obtain
a lock on that object;
* if the object is locked, a transaction wanting to
obtain a lock on it must wait;
* reasonably simple to implement -- but also
pretty restrictive (concurrent reads aren't allowed)
* a second classic locking approach:
read/write, or exclusive/shared, locks;
* a lock has 3 states: unlocked, read (share) locked,
write (exclusive) locked;
* if an object is read-locked by a transaction,
other transactions can also obtain read-locks on
that object;
* if an object is write-locked by a transaction,
other transactions may NOT obtain ANY lock
(read or write) on that object;
transaction T wants a read-lock on object O
* IF object O is not locked: T gets the read-lock
* else if oject O has a read-lock on it currently:
T can also get a read-lock
* else if object O has a write-lock on it currently:
T has to wait;
transaction T wants a write-lock on object O
* IF object O is not locked: T gets the write-lock
* else if object O has a read-lock:
T has to wait
* else if object O has a write-lock:
T has to wait
* one issue: locking, as described thus far,
helps to achieve isolation,
but does not guarantee serializability;
* there are several to achieve this -- one
way is to add a protocol such as
two-phased locking
atop the locking approach;
* basic idea: a transaction can request and obtain
whatever locks it needs, but once it releases
a lock, it can obtain NO MORE locks;
...why two-phased? because it results
in a transaction having a so-called "growing
phase", in which locks can be obtained,
and a so-called shrinking phase, in which
locks are released;
* it can be shown that this can help ensure
serializibility with locking;
(IBM's DB2 had (maybe still has?) a more-restrictive
(easy to implement!) version of this:
...it just doesn't release any of a transaction's
locks until commit or rollback is done...!)
* with locks, you can have DEADLOCK --
for example, a pair of transactions each waiting for
something
locked by the other;
* MANY approached to deal with this;
timeouts,
detection (let it occur, detect it, and break it),
prevention (prevent it from occurring at all)
ignore it...!
* timestamp - an alternative to locking for concurrency control
timestamp: a monotonic counter (just gets bigger and bigger
* each transaction gets a unique timestamp when it
begins
* all actions of a transaction are considered to have
the timestamp of that transaction
* the DBMS then has the responsiblity to make sure
that conflicting operations are done in
timestamp order;
...thus ensuring serializability;
* if Ti issues a read(Q):
* if TS(Ti) < W-ts(Q),
then Ti is seeking to read Q but it has been
overwritten by a "future" transaction (according
to timestamp ordering),
read will be rejected, Ti will be rolled back
(and restarted with a new timestamp)
* if TS(Ti) >= W-ts(Q),
it is "safe" to execute this read,
and R-ts(Q) will be set to the max(TS(Ti), R-ts(Q))
* if Ti issues a write(Q),
* if TS(Ti) < R-ts(Q),
then a "later" transaction already read this Q --
write is rejected, Ti is rolled back and restarted
* else if TS(Ti) < W-ts(Q),
then a "later" transaction wrote this Q --
write is rejected, Ti is rolled back and restarted
* else, the write it accepted,
and W-ts(Q) is updated to the TS(Ti)
* note the higher overhead here: 2 timestamps for each
data chunk!
* optimistic methods - based on the assumption
that most database operations do not conflict;
...see the PDF version of the notes for
the short description of this;