Recall that one of the key advantages of using a database is data can be easily viewed by multiple users. However, this means when users commit transactions at different times, a lot of things can go wrong.
Assume we have two transactions, and , which both perform read operations and write operations. And the data to be operated on is
- Lost updates: Also known as write-write conflict, occurs when a write from an older transaction gets overwritten by a newer transaction.
- Uncommitted data: Also known as a dirty read or a write-read conflict, occurs when a transaction reads data written by an older transaction which was supposed to be aborted/rolled back.
- Inconsistent retrievals: Also known as incorrect summary, occurs when aggregate functions are applied in a transaction while some other transaction is modifying values
Conflicts
Lost Update
The gets lost, and neither transaction has the ‘correct’ value of .
In a schedule. Let’s assume
The goal of transaction is to subtract 20 from and the goal of is to subtract 10 from .
To transaction , is 20 lesser than it’s original amount. To transaction , is 10 less than it’s original amount. Because does not read in the new updated value of before writing, the old write of gets lost and neither has the true correct value of (which should be 70)
Uncommitted Data
reads the value of pre-rollback, which means the changes that should be undone by are not treated as such in .
In a schedule, assuming :
If got aborted after reads in , then, to the abortion hasn’t happened. To transaction , it has reset the state of the database, but transaction hasn’t registered that.
The ‘true’ value of is ambiguous, but if the goal is that should have been performed before , then should be 90
Inconsistent Retrieval
Let be a transaction that returns the sum of data , . If those values get changed during the summation, we get inconsistencies:
Let :
returns an inconsistent sum. If we wanted to sum and before any changes were made, then . If we want the sum after changes, .
Note that this just an example, in most cases, aggregate functions are used over large amounts of data.
Serializability
In order to avoid such conflicts, database transactions must be serializable, which means they can be done one after the other, and there are no ‘interleaved’ transactions.
Equivalently, the schedule used to run the transactions have the same final state as that of a serial schedule.
In most cases, we don’t want to actually use a serial schedule, as that ends up being very slow, since every transaction must wait on other transactions to finish.
Concurrency Control
There are three main ways to achieve serializable transactions:
- Locking: Must common. ‘Locks’ data, preventing other transactions from accessing it.
- Timestamping
- Optimistic Concurrency Control
Locking
A lock manager assigns locks to sectors of data.
When a transaction needs to access data for read/write, a lock is assigned to the data. Other transactions cannot access locked data, and need to wait until it becomes unlocked. When the transaction is complete, the data is unlocked.
Lock Levels
There are many lock granularities (how much data is locked at a time):
- Database-level lock: Entire database is locked
- Good for batch processing but unsuitable for multi-user DBMSs
- Different transactions and can not access the same database concurrently even if they use different tables
- Used in SQLite, Access
- Table-level lock: Entire table is locked
- and can access the same database concurrently as long as they use different tables
- Can cause bottlenecks, even if transactions want to access different parts of the table and would not interfere with each othera
- Not suitable for highly multi-user DBMSes
- Page-level lock
- An entire disk page is locked
- Not commonly used now
- Row-level lock: Allows concurrent transactions to access different rows of the same table, even if the rows are located on the same page
- Improves data availability but with high overhead (each row has a lock that must be read and written to)
- Currently the most popular approach (MySQL, Oracle)
- Field-level lock: Allows concurrent transactions to access the same row, as long as they access different attributes within that row
- Most flexible lock but requires an extremely high level of overhead
- Not commonly used
Lock Types
-
Binary Locks: Either locked or not locked.
- Eliminates the lost update conflict.
- However, too restrictive for optimal concurrency. When two transactions are reading data (and not writing), binary locks still prevent it.
-
Shared & Exclusive Locks: Two separate types of locks: exclusive and shared.
- Exclusive Lock: If a lock is exclusive, it means only one transaction is allowed access to the data.
- Used when a transaction is writing data
- The lock manager will only grant this lock if no other locks (exclusive or shared) are on the data already. That is, if another transaction is waiting to read the data (shared lock) or write the data before (exclusive lock), the lock manager will not grant another exclusive lock.
- Shared Lock: Means the data can be read by other transactions. Multiple transactions can apply a shared lock on the same data, since they are only reading data
- Used when a transaction is reading data
- Granted if the data only contains shared locks
- Exclusive Lock: If a lock is exclusive, it means only one transaction is allowed access to the data.
Deadlock
Unfortunately, locking comes with it’s own drawbacks. One such issue is deadlocks, which is when two transactions depend on each other to unlock, resulting in an infinite loop:
Let be two transactions. A schedule like the one below would result in a deadlock:
In the third timeframe, is waiting on the lock on (locked by ) to release, while at the same time, is waiting on the lock of to be released. Since both locks will not release until the transaction is completed, a deadlock occurs.
Deadlocks only occur for exclusive locks, not shared locks
More specifically, it occurs when two exclusive locks are put. Since shared locks can be applied on data containing an existing shared lock, a deadlock isn’t possible for shared locks
Timestamp Ordering
Another method is to assign unique, ordered timestamps to each operation in a transaction. Data that is read or written by the transaction then has the same timestamp.
This allows the DBMS to know the timestamp any section of data was last read, or when it was last written to. Access is allowed to data based on these timestamps
#todo https://en.wikipedia.org/wiki/Timestamp-based_concurrency_control
Optimistic
- Based on the assumption that the majority of database operations do not conflict.
- Transaction is executed without restrictions or checking
- Then when it is ready to commit, the DBMS checks whether any of the data it read has been altered – if so, rollback