Concurrent Transactions

Recall that one of the key advantages of using a database is data can be easily viewed by multiple users. However, this means when users commit transactions at different times, a lot of things can go wrong.

Assume we have two transactions, $A$ and $B$ , which both perform read operations and write operations. And the data to be operated on is $x$

Lost updates: Also known as write-write conflict, occurs when a write from an older transaction gets overwritten by a newer transaction.
Uncommitted data: Also known as a dirty read or a write-read conflict, occurs when a transaction reads data written by an older transaction which was supposed to be aborted/rolled back.
Inconsistent retrievals: Also known as incorrect summary, occurs when aggregate functions are applied in a transaction while some other transaction is modifying values

Conflicts

Lost Update

A_{re a d} \to B_{re a d} \to A_{w r i t e} \to B_{w r i t e}

The $A_{w r i t e}$ gets lost, and neither transaction has the ‘correct’ value of $x$ .

In a schedule. Let’s assume $x = 100$

S = A R (x) W (x, x - 20) B R (x) W (x, x - 10) x_{A} 1001008080 x_{B} 10010090

The goal of transaction $A$ is to subtract 20 from $x$ and the goal of $B$ is to subtract 10 from $x$ .

To transaction $A$ , $x$ is 20 lesser than it’s original amount. To transaction $B$ , $x$ is 10 less than it’s original amount. Because $B$ does not read in the new updated value of $A$ before writing, the old write of $A$ gets lost and neither has the true correct value of $x$ (which should be 70)

Uncommitted Data

A_{re a d} \to A_{w r i t e} \to B_{re a d} \to A_{ro ll ba c k} \to B_{w r i t e}

$B_{re a d}$ reads the value of $x$ pre-rollback, which means the changes that should be undone by $A$ are not treated as such in $B$ .

In a schedule, assuming $x = 100$ :

S = A R (x) W (x, x - 20) A b or t B R (x) W (x, x - 10) x_{A} 1008080100100 x_{B} 808070

If $A$ got aborted after $B$ reads in $x$ , then, to $B$ the abortion hasn’t happened. To transaction $A$ , it has reset the state of the database, but transaction $B$ hasn’t registered that.

The ‘true’ value of $x$ is ambiguous, but if the goal is that $A$ should have been performed before $B$ , then $x$ should be 90

Inconsistent Retrieval

Let $C$ be a transaction that returns the sum of data $x$ , $y$ . If those values get changed during the summation, we get inconsistencies:

Let $x, y, = 100$ :

S = A W (x, 50) B W (y, 50) C R (x) R (y) x 100505050 y 1001005050 C_{s u m} 100100100150

$C$ returns an inconsistent sum. If we wanted to sum $x$ and $y$ before any changes were made, then $C_{s u m} = 200$ . If we want the sum after changes, $C_{s u m} = 100$ .

Note that this just an example, in most cases, aggregate functions are used over large amounts of data.

Serializability

In order to avoid such conflicts, database transactions must be serializable, which means they can be done one after the other, and there are no ‘interleaved’ transactions.

Equivalently, the schedule used to run the transactions have the same final state as that of a serial schedule.

In most cases, we don’t want to actually use a serial schedule, as that ends up being very slow, since every transaction must wait on other transactions to finish.

Concurrency Control

There are three main ways to achieve serializable transactions:

Locking: Must common. ‘Locks’ data, preventing other transactions from accessing it.
Timestamping
Optimistic Concurrency Control

Locking

A lock manager assigns locks to sectors of data.

When a transaction needs to access data for read/write, a lock is assigned to the data. Other transactions cannot access locked data, and need to wait until it becomes unlocked. When the transaction is complete, the data is unlocked.

Lock Levels

There are many lock granularities (how much data is locked at a time):

Database-level lock: Entire database is locked
- Good for batch processing but unsuitable for multi-user DBMSs
- Different transactions $T_{1}$ and $T_{2}$ can not access the same database concurrently even if they use different tables
- Used in SQLite, Access
Table-level lock: Entire table is locked
- $T_{1}$ and $T_{2}$ can access the same database concurrently as long as they use different tables
- Can cause bottlenecks, even if transactions want to access different parts of the table and would not interfere with each othera
- Not suitable for highly multi-user DBMSes
Page-level lock
- An entire disk page is locked
- Not commonly used now
Row-level lock: Allows concurrent transactions to access different rows of the same table, even if the rows are located on the same page
- Improves data availability but with high overhead (each row has a lock that must be read and written to)
- Currently the most popular approach (MySQL, Oracle)
Field-level lock: Allows concurrent transactions to access the same row, as long as they access different attributes within that row
- Most flexible lock but requires an extremely high level of overhead
- Not commonly used

Lock Types

Binary Locks: Either locked or not locked.
- Eliminates the lost update conflict.
- However, too restrictive for optimal concurrency. When two transactions are reading data (and not writing), binary locks still prevent it.
Shared & Exclusive Locks: Two separate types of locks: exclusive and shared.
- Exclusive Lock: If a lock is exclusive, it means only one transaction is allowed access to the data.
  - Used when a transaction is writing data
  - The lock manager will only grant this lock if no other locks (exclusive or shared) are on the data already. That is, if another transaction is waiting to read the data (shared lock) or write the data before (exclusive lock), the lock manager will not grant another exclusive lock.
- Shared Lock: Means the data can be read by other transactions. Multiple transactions can apply a shared lock on the same data, since they are only reading data
  - Used when a transaction is reading data
  - Granted if the data only contains shared locks

Deadlock

Unfortunately, locking comes with it’s own drawbacks. One such issue is deadlocks, which is when two transactions depend on each other to unlock, resulting in an infinite loop:

Let $T_{1}, T_{2}$ be two transactions. A schedule like the one below would result in a deadlock:

S = T_{1} L c k (X) W (X, x) R (Y) U n l c k (X) T_{2} L c k (Y) W (Y, y) R (X) U n l c k (Y) X_{T_{1}} x x Y_{T_{1}} Locked Locked Deadlock! X_{T_{2}} Locked Locked Deadlock! Y_{T_{2}} y y

In the third timeframe, $T_{1}$ is waiting on the lock on $Y$ (locked by $T_{2}$ ) to release, while at the same time, $T_{2}$ is waiting on the lock of $X$ to be released. Since both locks will not release until the transaction is completed, a deadlock occurs.

Deadlocks only occur for exclusive locks, not shared locks

More specifically, it occurs when two exclusive locks are put. Since shared locks can be applied on data containing an existing shared lock, a deadlock isn’t possible for shared locks

Timestamp Ordering

Another method is to assign unique, ordered timestamps to each operation in a transaction. Data that is read or written by the transaction then has the same timestamp.

This allows the DBMS to know the timestamp any section of data was last read, or when it was last written to. Access is allowed to data based on these timestamps

#todo https://en.wikipedia.org/wiki/Timestamp-based_concurrency_control

Optimistic

Based on the assumption that the majority of database operations do not conflict.
Transaction is executed without restrictions or checking
Then when it is ready to commit, the DBMS checks whether any of the data it read has been altered – if so, rollback

Questionably Accurate Notes

Explorer

Concurrent Transactions

Conflicts

Lost Update

Uncommitted Data

Inconsistent Retrieval

Serializability

Concurrency Control

Locking

Lock Levels

Lock Types

Deadlock

Timestamp Ordering

Optimistic

Table of Contents

Related Concepts

See Also:

Questionably Accurate Notes

Explorer

Concurrent Transactions

Conflicts §

Lost Update §

Uncommitted Data §

Inconsistent Retrieval §

Serializability §

Concurrency Control §

Locking §

Lock Levels §

Lock Types §

Deadlock §

Timestamp Ordering §

Optimistic §

Table of Contents

Related Concepts

See Also:

Conflicts

Lost Update

Uncommitted Data

Inconsistent Retrieval

Serializability

Concurrency Control

Locking

Lock Levels

Lock Types

Deadlock

Timestamp Ordering

Optimistic