Transaction Isolation Levels
This article will deep dive into various transaction isolation levels, why do we need them and some examples
When we talk about ACID properties in the databases, “C” stands for Consistency, which means that transactions should always produce a consistent result i.e. the data residing inside tables should always be consistent with the table schema or any other constraints.
For consistency of the database, we need Isolation of transactions. The Isolation level determines the degree to which a transaction must be isolated from the data modifications made by any other transaction in the database system.
Why do we need Isolation Levels?
However, some issues can arise in database systems due to concurrency, where multiple transactions access and modify data simultaneously. These terms describe different ways in which a transaction might read inconsistent data due to another transaction's modifications.
For example:
Dirty Read: This occurs when a transaction reads data that has been modified by another transaction, but that modification hasn’t been committed yet. Imagine Transaction 1 updates a bank account balance but hasn't finalized the update (committed). Transaction 2, under a low isolation level, might be able to read this uncommitted balance. If Transaction 1 rolls back later (for example, due to insufficient funds), Transaction 2 would have read "dirty" data that never truly reflected the account's state.
Non-Repeatable Read: This happens when a transaction reads the same piece of data twice within its execution, but the value differs between the reads because another transaction has committed a modification in the meantime. Transaction 1 might read a product's price in the morning. Later within a few milliseconds, within the same transaction, it reads the price again to prepare an invoice. However, another transaction might have updated the price in between. This can lead to confusion and inconsistencies.
Phantom Read: This arises when a transaction reads a set of data based on some criteria twice, but the results differ because another transaction has inserted or deleted rows that match the criteria in between the reads. Imagine Transaction 1 searches for all active users. Later, another transaction adds a new user. If Transaction 1 repeats the search within its execution, it might see a "phantom" user that wasn't there before due to the concurrent insert.
So, how do we solve it?
Enters the solution: Transaction Isolation Levels
Transaction Isolation Levels
Transaction Isolation Levels define how much a transaction is shielded from changes made by other transactions happening concurrently in the database system. It determines what kind of data a transaction can see and when impacting the balance between data consistency and performance.
There are four standard isolation levels defined by the SQL standard(yes, because NoSQL does not guarantee ACID properties):
“READ UNCOMMITTED”: This is the least isolated level. Here, a transaction can see uncommitted changes from other transactions, leading to potential inconsistencies. Imagine Transaction 1 updates a record but hasn't committed it yet. Transaction 2, under this level, might be able to read that uncommitted data. If Transaction 1 rolls back later, Transaction 2 would have read data that never truly existed.
“READ COMMITTED”: This level offers some improvement. A transaction can only see changes from other transactions that have already been committed. This eliminates the possibility of dirty reads (reading uncommitted data) but doesn't guarantee consistent results across multiple reads within the same transaction. This is because another transaction would have committed some changes in between the first and second read. The “Read Committed“ is the default isolation level bundled in PostgreSQL.
“REPEATABLE READ”: This level provides stronger isolation. Here, a transaction will see the same data throughout its execution, even if other transactions commit changes to the same data. This eliminates non-repeatable reads (where the same data is read differently within a transaction). The “Repeatable Read” is the default isolation level set in MySQL.
“SERIALIZABLE”: This is the most isolated level. Transactions are essentially serialized, meaning they are executed one after the other, as if no other transactions are happening concurrently. This ensures the highest level of data consistency but can significantly impact performance.
In a nutshell, the isolation levels can be related to the problems in the following table. Please note that this is pretty generic in nature but might not hold true for every database. For example: PostgreSQL does not give dirty reads in Read UnCommitted isolation level and does not give Phantom Reads in Repeatable Read Isolation level.
Examples
Now that we have understood what isolation levels are, let’s try to understand some practical use cases of these isolation levels
Booking tickets online: In case of any ticket booking system for flights/movies/events, we want that no seat should be assigned to two or more people. Thus, it is recommended to use the SERIALIZABLE isolation level which prevents dirty reads, non-repeatable reads, and phantom reads.
Inventory Management Systems: In systems where inventory levels are critical, ensuring that the count of items remains consistent throughout a transaction is important. For example, when an order is placed, the system needs to ensure that the inventory count does not change during the process of checking the inventory and recording the sale. That’s why it’s recommended to use a “REPEATABLE READ” isolation level here.
Logging Systems: In high-performance logging systems, speed is often prioritized over consistency. If a log entry is partially written and read by another transaction, the system can handle such inconsistencies without major issues. Thus, “READ UNCOMMITTED“ or even “READ COMMITTED“ would do fine here.
Conclusion
Choosing the right isolation level depends on the specific needs of the application. Here are some general guidelines:
Prioritize data consistency: Use higher isolation levels (Serializable, Repeatable Read) for financial transactions or situations where data consistency is crucial.
Prioritize performance: Consider Read Committed or even Read Uncommitted (with caution) for scenarios where seeing the absolute latest data isn't essential and high responsiveness is desired (e.g., social media feeds).
Remember, the best isolation level requires careful consideration of your application's needs and the trade-off between data consistency and performance.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Book exclusive 1:1 with me here.
Hey
In the example of "Booking tickets online", could you please explain what kind of issue was caused by "Phantom Read"??