Database Internals - Write Ahead Log File Structure
This article deep dives into the file structure of a write-ahead log file
Write Ahead logging is a technique that says: changes to the data files(tables and indexes) must be written after those changes have been logged i.e. records describing those changes have been flushed to permanent storage.
Refer here for my detailed explanation and benefits of WAL: Database Internals: Write Ahead Logging(WAL)
In this article, I deep-dived into the PostgreSQL WAL(write-ahead log) files and tried to understand the file structure. Here are the steps if you want to follow along, otherwise skip the installation steps and jump to Analyzing the file structure.
Note: I am working on macOS 14.3 and PostgreSQL 14 versions. If you’re using another Operating system, please search for corresponding commands.
Step 1: Installing PostgreSQL
brew install postgresql
Step 2: Starting the server
brew services start postgresql
Step 3: Installing libpq
This step involved installing the libpq library from the homebrew which installs some other commands required alongside the PostgreSQL for analyzing the WAL file.
brew install libpq
After this step, you should have your “pg_waldump“ command installed and if you run “which pg_waldump“, it should print the location of your executable file for this command. This is the output in my case: “/opt/homebrew/bin/pg_waldump “
Step 4: Locating WAL files
At this point, I located the pg_wal folder which contains the WAL files of the postgres. They are under the following directory:
“/opt/homebrew/var/postgresql@14/pg_wal/“
Step 5: Connect to Postgres
psql -U postgres
Step 6: Create a table “employee“
CREATE TABLE employee (
first_name VARCHAR(50),
last_name VARCHAR(50),
date_of_birth DATE,
hire_date DATE,
salary DECIMAL(10, 2)
);
Please note that this is just an example so that I can easily insert the data and analyze the WAL file. The real table `employee` might have more columns like id (primary key), department, etc.
Step 7: Adding some records to the employee table
I used the following command to repeatedly add some records to the employee table.
INSERT INTO employee (first_name, last_name, date_of_birth, hire_date, salary)
VALUES ('John', 'Doe', '1990-05-15', '2020-01-01', 50000.00);
Step 8: Analyzing the WAL file
After I hit around 20 INSERT operations, I used the following command to analyze the file:
“pg_waldump /opt/homebrew/var/postgresql@14/pg_wal/000000010000000000000001”
This command generated a big output of the WAL file but here’s a short exact context of that file.
rmgr: Transaction len (rec/tot): 34/ 34, tx: 771, lsn: 0/01787188, prev 0/01787138, desc: COMMIT 2024-02-11 13:28:49.944018 IST
rmgr: Heap len (rec/tot): 80/ 80, tx: 772, lsn: 0/017871B0, prev 0/01787188, desc: INSERT off 29 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 772, lsn: 0/01787200, prev 0/017871B0, desc: COMMIT 2024-02-11 13:28:50.233002 IST
rmgr: Heap len (rec/tot): 80/ 80, tx: 773, lsn: 0/01787228, prev 0/01787200, desc: INSERT off 30 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 773, lsn: 0/01787278, prev 0/01787228, desc: COMMIT 2024-02-11 13:28:50.562946 IST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/017872A0, prev 0/01787278, desc: RUNNING_XACTS nextXid 774 latestCompletedXid 773 oldestRunningXid 774
rmgr: Heap len (rec/tot): 80/ 80, tx: 774, lsn: 0/017872D8, prev 0/017872A0, desc: INSERT off 31 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 774, lsn: 0/01787328, prev 0/017872D8, desc: COMMIT 2024-02-11 13:28:50.861983 IST
rmgr: Heap len (rec/tot): 80/ 80, tx: 775, lsn: 0/01787350, prev 0/01787328, desc: INSERT off 32 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 775, lsn: 0/017873A0, prev 0/01787350, desc: COMMIT 2024-02-11 13:28:51.197795 IST
rmgr: Heap len (rec/tot): 80/ 80, tx: 776, lsn: 0/017873C8, prev 0/017873A0, desc: INSERT off 33 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 776, lsn: 0/01787418, prev 0/017873C8, desc: COMMIT 2024-02-11 13:28:51.537681 IST
rmgr: Heap len (rec/tot): 80/ 80, tx: 777, lsn: 0/01787440, prev 0/01787418, desc: INSERT off 34 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0
The above PostgreSQL WAL file log shows entries for several insert operations that happened between 13:28:49 IST and 13:28:51 IST on February 11, 2024. Here's a breakdown of the two most used commands (rmgr: transaction) and (rmgr: Heap)
rmgr: Transaction: This indicates that a database transaction is being performed. The description contains what kind of operation it is on the transaction level like COMMIT, ROLLBACK, etc.
len (rec/tot): 34/34: The length of the record is 34 bytes.
tx: 771: This is the transaction ID, which helps track changes and ensure consistency.
lsn: 0/01787188: This is the Log Sequence Number, which is a unique identifier for each record in the WAL (Write-Ahead Logging) file.
prev 0/01787138: This is the previous log sequence number, indicating the order of records. The first part (
0
) indicates the segment number within the WAL file. The second part (0/01787138
) is the relative offset within that segment.desc: COMMIT 2024-02-11 13:28:49.944018 IST: This indicates that a commit action occurred at the specified timestamp and timezone.
rmgr: Heap: This indicates an operation on a table. The description contains what kind of operation it is on the table level like INSERT, UPDATE, DELETE, etc.
len (rec/tot): 80/80: The length of the record is 80 bytes.
tx: 772: Transaction ID.
lsn: 0/017871B0: Log Sequence Number.
prev 0/01787188: This is the previous log sequence number
desc: INSERT off 29 flags 0x00, blkref #0: rel 1663/16385/16401 blk 0: This indicates an INSERT operation at offset 29, with some additional details about the table and block reference.
Similar actions follow, including more transactions (rmgr: Transaction) and heap operations (rmgr: Heap).
The last line indicates the insertion of data into the table (rmgr: Heap) with transaction ID 777.
In summary, the log shows that several rows were inserted into a table (relation ID 1663) between 13:28:49 and 13:28:51 IST. Each insert was part of a separate transaction (tx ID 771 to 777). The rows are not yet visible ("off") as the transactions haven't been fully committed. Once the transaction commits successfully, the changes become visible, and the "off" flag becomes irrelevant
Also, after analyzing the WAL file, I asked myself “Where is the actual data which has been inserted into the database?“ I think the answer to this question is the fact that WAL files are actually binary files and not human-readable. So, the command “pg_waldump” might be omitting the true data and printing some information only.
Thus, when I tried to use the following command "cat /opt/homebrew/var/postgresql@14/pg_wal/000000010000000000000001", I was able to get the following output which contained “John“ as the keyword and it ensured that the WAL file had the true data.
Key Takeaways
WAL files are append-only files which means you can only append at the end of the file and cannot modify anything existing already.
WAL file does not mean it’s a very big single file but it can consist of multiple files and are stored under the directory
pg_wal
. Each file is called a segment which is normally 16 MB in size. Each segment of 16 MB consists of multiple pages (or blocks whatever you like) where each page consists of 8 KB. (you can modify the segment size or the page size if you know what you’re doing)Segment files are given ever-increasing numbers as names, starting at
000000010000000000000001.
Whenever any entry goes into the WAL file, a unique entry is assigned to each record which is called the Log Sequence Number (LSN). This log sequence number is actually the byte offset of the entry in the WAL file.Since LSNs are sequential, searching for specific WAL records becomes efficient. You can directly jump to the desired record based on its LSN without needing to scan the entire file from the beginning.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your LinkedIn network and schedule a free 30-minute 1:1 with me:) Ping me on Linkedin!
Resources
WAL Internals: Postgres official documentation
Transaction Log by Wikipedia
Write Ahead Logging by Arpit
Thank you for this article, was really insightful!
Nice article, sneak peak into the actual WAL file was interesting.
Any idea how often these WAL files are rebased or purged ? Since over years these could build up like anything especially on a write heavy postgres instance.