Database interview questions

Table of Contents

Lock-Based Concurrency Control

Lock-based protocols in DBMS ensure that a transaction cannot read or write a data item until it obtains the appropriate lock. By controlling access in this way, these protocols maintain serializability and prevent concurrency problems such as lost updates or dirty reads.

A lock is a mechanism used to manage concurrent access to database items. Each transaction must acquire the necessary lock before performing read or write operations. There are two main types of locks:

Shared Lock (S): Also known as a read-only lock. Multiple transactions can hold a shared lock on the same data item simultaneously because reading does not modify the data. It is requested using lock-S.
Exclusive Lock (X): Allows both reading and writing of a data item. Only one transaction can hold an exclusive lock at a time. It is requested using lock-X.

A transaction can acquire a lock only if it is compatible with existing locks held by other transactions:

• Multiple transactions can hold shared (S) locks on the same data item.
• If a transaction holds an exclusive (X) lock, no other transaction can hold any type of lock on that item.
• If a requested lock is not compatible, the transaction must wait until all conflicting locks are released.

Types of Lock-Based Protocols

1. Simplistic Lock Protocol:Every transaction must obtain a lock before performing insert,delete, or update operations. The lock is released only when the transaction completes.
2. Pre-Claiming Lock Protocol:To prevent deadlocks, a transaction requests all required locks before execution begins. It executes only if all locks are granted; otherwise, it waits or rolls back. Although simple, this method may reduce efficiency when many transactions are active.
3. Two-Phase Locking (2PL) Protocol:This is the most widely used lock-based protocol. A transaction follows the Two-Phase Locking protocol if all its locking and unlocking operations are divided into two distinct phases:
Growing Phase: The transaction acquires all the locks it needs. It cannot release any locks during this phase.
Shrinking Phase: Once the transaction starts releasing locks, it cannot acquire new ones.

The moment a transaction has acquired all the locks it requires is called the lock point. After this point, it transitions from the growing to the shrinking phase.

Lock Upgrading and Downgrading

Upgrading a Lock: Changing a shared lock (S) to an exclusive lock (X). For instance, if a transaction initially reads a data item but later decides to update it, it must upgrade the lock to X. This can occur only during the growing phase.
Downgrading a Lock: Changing an exclusive lock (X) to a shared lock (S). For example, if a transaction decides it only needs to read data after previously locking it for writing, it can downgrade during the shrinking phase.

By enforcing these rules, the Two-Phase Locking protocol ensures consistency and prevents conflicts between concurrent transactions. Restricting when locks can be acquired, upgraded, or released helps maintain a serializable schedule, preserving the integrity of the database.

Optimistic vs Pessimistic Locking

Pessimistic locking is a concurrency control mechanism that assumes data conflicts are likely when multiple users or processes access the same resource simultaneously. To prevent these conflicts, it acquires locks on resources before performing operations, ensuring exclusive access for the duration of a transaction. Once a lock is obtained, no other transaction can modify or read the locked resource until the lock is released. This approach guarantees consistency but can lead to waiting times, reduced concurrency, and potential performance bottlenecks under heavy load.

Pessimistic locking can be applied at different levels of granularity — from database-level and table-level locks to row-level or even column-level locks. It is commonly used in environments where data integrity is critical and the cost of conflicts is high, such as in banking systems and financial transactions.

Optimistic locking takes the opposite approach to traditional locking mechanisms. It assumes that write conflicts are rare and that most transactions can proceed concurrently without interfering with one another. Instead of preventing access to shared data, the database allows multiple transactions to read and modify the same row at the same time.

Rather than locking rows during reads, optimistic locking detects conflicts only at the time an update is attempted. Each row is associated with a version identifier, such as a numeric counter, a timestamp, or a hash. This version is read together with the row and later used to verify that the data has not been modified by another transaction.

When an update occurs, the database checks whether the version stored in the table is still the same as the version that was originally read. If the versions match, the update succeeds and the version is updated atomically along with the data. If the versions do not match, it means another transaction has modified the row in the meantime, and the update is rejected to prevent lost updates.

Version identifiers can be implemented in several ways. A common approach is an integer version field that increments on every successful update. Some systems use a timestamp field such asupdated_at, which is overwritten on modification, while others use a checksum or hash derived from the row contents that is recomputed whenever the data changes.

The optimistic locking workflow begins by reading the row together with its current version.

1SELECT id, balance, version
2FROM accounts
3WHERE id = 1;

The application receives the row data along with the version value.

1id = 1, balance = 100, version = 5

When updating the row, the application includes the previously read version in the update condition. This ensures that the update will only succeed if no other transaction has modified the row since it was read.

1UPDATE accounts
2SET balance = 120, version = version + 1
3WHERE id = 1 AND version = 5;

If exactly one row is affected, the update succeeds and the version is incremented. If zero rows are affected, the version check has failed, indicating a conflict. The application can then decide whether to retry the operation using the latest data or abort the transaction altogether.

Optimistic locking minimizes locking overhead and maximizes concurrency, making it well suited for high-read systems and workloads with low write contention, such as REST APIs, microservices, and distributed applications.

In summary, pessimistic locking prioritizes safety and consistency by restricting access through locks, while optimistic locking prioritizes performance and scalability by assuming that conflicts are infrequent and resolving them only when necessary.

Write-Ahead Logging (WAL)

Write-Ahead Logging (WAL) is a core durability mechanism in databases. Before any change is made to the actual data files on disk, the change must first be written to a log that is persisted safely. This design allows PostgreSQL to guarantee durability without slowing down every transaction with expensive disk writes.

When you commit a transaction, PostgreSQL does not immediately flush the modified data pages to disk. Instead, it follows a carefully optimized sequence. The database first records all changes made by the transaction in the WAL. These WAL records are then flushed to disk using fsync, ensuring the log is safely persisted. Once this flush completes, PostgreSQL reports a successful commit back to the application. The actual data pages are written to disk later by background processes.

The key insight is that the WAL contains enough information to fully reconstruct every committed transaction. As long as the WAL is durable, the database can always recover to a consistent state, even if the data files themselves are temporarily out of date.

If a crash occurs after the commit has been acknowledged but before the modified data pages are written to disk, PostgreSQL will replay the WAL during startup. This recovery process re-applies all committed changes, restoring the database to the exact state it was in before the crash. Although recovery may take longer in this scenario, crashes are relatively rare, making this trade-off highly effective in practice.

From a performance perspective, WAL provides several important advantages. Writes to the WAL are sequential, which is significantly faster than random writes to data pages scattered across disk. Multiple transactions can also be grouped together during WAL flushes, reducing the number of expensive disk sync operations. As a result, commit latency becomes more predictable and primarily depends on WAL write throughput rather than complex data page updates.

In short, WAL allows PostgreSQL to decouple transaction durability from data page persistence. By making the log durable first and deferring data writes, PostgreSQL achieves both strong crash safety and high transaction throughput.