Database Internals - Write Ahead Logging (WAL)
This article deep dives into what is write ahead logging and why is it a useful concept
Data(tables or indexes) is stored on the permanent disk in the form of data files. Each data file consists of multiple pages. Whenever we read/write from and to the disk, we either read or write page(s). So you can say a page is the minimum storage unit on disk that we use for reads and writes.
Every real-world system wants its client’s or customer’s information to persist in permanent storage to retrieve it later.
But there are two main problems while writing directly to a database on a disk:
Problem 1: Reliable data persistence
For data to be persisted, we would want every write incoming to the database to be written directly on the disk. But while a database transaction is updating a page or multiple pages, what if the Operating system fails or the system reboots/crashes? Your database would be in an inconsistent state. So, this is not a reliable way to update the Database.
Problem 2: Slow Writes
Writing every update directly on the database(or data files stored on disk) makes performance slower. For every write, the disk would have to perform some mechanical movement and spin to the right position so that the writing arm could write the data on the right pages. Thus, there is a lot of Disk I/O involved in directly writing to the disk.
So, how do we make the database write faster and durable?
Enters the solution: Write-Ahead Logging(WAL)
What is Write-Ahead Logging(WAL)?
Write Ahead logging is a technique that says: changes to the data files(tables and indexes) must be written after those changes have been logged i.e. records describing those changes have been flushed to permanent storage. Write-ahead logging is implemented by writing the transactions into a sequential append-only log known as a transaction log or redo log. But how does the whole picture look in the case of read-and-write requests?
Before we deep dive further, it’s important to know that the database has a shared memory pool. The shared memory pool is used to store pages. When any writes/modifications are done, it is done to these pages in the memory pool and then these pages from the memory pool are periodically flushed to the disk, thus making the changes in the pages permanent to the database.
Note: The pages from the memory pool are replaced as it is with the pages present in the disk. Always remember that page is the smallest unit of storage for read and write operations in the database!
okay, Let’s understand a little deeper now for both reads and writes!!
Write Request:
When a write request comes, the database does two things synchronously:
Writes the transaction as it is to the write-ahead log file.
Finds the corresponding pages (which contain the required rows) from the disk, places the pages into the shared memory pool, and updates the pages.
After the WAL is written successfully and the pages are updated (also referred to as the dirty pages) in the memory pool, the transaction is marked successful and returns an OK response back to the client.
Read request:
When a read request comes, the database tries to read the rows from the memory pool. If those pages are not found, it then tries to find the pages from the database on the disk, load them in the memory pool, and then return the required information.
Concept of “fsync” and “checkpointing”
The process of periodically flushing the dirty pages from the shared memory pool to the disk is done using the “fsync” command.
Since we cannot keep writing infinitely to the log files to maintain the records, whenever the dirty pages from the memory pool are persisted into the disk, the logs in the WAL are purged since we no longer need them. This process of purging the WAL records is called checkpointing.
Why WAL is a reliable technique?
Suppose, a database crashed in the middle of updating pages in the memory pool but we know that those changes were committed to the WAL files. When the database comes up, it will try to keep itself up to date by comparing the WAL logs. Since the Write Ahead log will be ahead of time compared to the database, the database will sync all the commands from the WAL and after processing all the transactions successfully, then only the database allows further reads and writes. Since the database is redoing these changes using the WAL logs, that’s why WAL logs are also known as Redo logs.
Benefits of WAL
Database ACID-Durability: This way of logging provides us the biggest benefit in that we don’t have to flush all the changes to disk on every transaction commit because, in case of database crashes or unexpected failures, we will be able to restore the database using the write-ahead log also known as redo log.
This essentially means that all those changes can also be recovered which have been logged into the redo log but not applied to the data pages.
Less disk writes: Using the WAL technique results in a significantly reduced number of disk writes because only the WAL records need to be flushed to the disk to guarantee that a transaction is committed, rather than updating the data files stored on the disk. Moreover, there will be fewer memory flushes to the disk as the same page might receive more writes and we can flush all the dirty pages to the disk in one go.
Faster Disk Writes: Sequential I/O for maintaining WAL logs is much better than doing random I/O on the disk to find and update the pages for every write.
Point in time Recovery: If there is a bug in the production that has resulted in your database being in an inconsistent or buggy state, you can replay the WAL logs to do a recovery of the database to any point in the past (provided those WAL logs are not purged due to checkpointing) because WAL logs contain all the transaction statements from the latest checkpoint.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free.
Resources
Write Ahead Logging by Hussein Nasser | Video
Write Ahead Logging - PostgreSQL Documentation
Best system design article on WAL I have ever read. Great job. I’ll feature this article in my next newsletter episode
what exactly is memory pool ? could you please elaborate it