Relational vs Document based databases
This post deep dives into the key differences between relational and document based databases and helps to find on what factors to choose which one.
Often when we think about choosing a new database for our use case, we choose a brute force solution i.e. choose the database first, and build our application. Then, when the application scales later on, we adjust the database thresholds/indexes/shard/partition to get the maximum performance or switch to a different database that can provide the same or more performance at less cost.
In this article, I am going to help explain 4 key differences between relational and document-based databases so that you can choose the right database ahead of time and use that choice for the longest time possible.
1. Data Schema
Relational databases are defined using tables with rows and columns whereas document databases store JSON-like(or BSON, XML) structures that support a variety of data types. The data is stored in pairs, similar to key/value pairs.
Example of data stored in a Relational database. Consider the following tables:
Books (book_id, title, author, ISBN, price)
Customers (customer_id, name, email, address)
Orders (order_id, customer_id, book_id, quantity, order_date)
This structure enforces relationships between tables. A book record in the `Books` table can be linked to multiple orders through the book_id in the `Orders` table. Similarly, a customer record from the `Customers` table can have many orders linked through the customer_id.
Example of data stored in a Document database:
Each document represents a book and includes all its details.
{
"_id": "123ABC", // Unique identifier for the document (like book_id)
"title": "The Lord of the Rings",
"author": "J.R.R. Tolkien",
"ISBN": "9780261102694",
"price": 25.99,
"reviews": [ // Array for storing reviews (optional)
{
"reviewer": "John Doe",
"rating": 5,
"comment": "A timeless classic!"
}
]
}
Similarly, customer documents would store customer information along with order history (potentially as an array within the document).
The point to note here is that relational databases store structured data that is stored according to the well-defined schema and thus provide rigidity in data format. However, document databases store unstructured data. Their schema is dynamic and you can add more key-value pairs in the document object without any changes required.
2. Data Integrity
Data Integrity means storing data in the database in such a way that it maintains its intended meaning and value over time. It should be accurate, consistent, and reliable.
Things can go wrong while storing data in your database like storing the wrong data type in the wrong columns. Relational databases enforce data integrity using “Referential integrity” as a first-order concept that ensures relationships are never abandoned and that data is correct.
Imagine a relational database for a library:
Books Table: (book_id, title, author)
Borrowers Table: (borrower_id, name)
Loans Table: (loan_id, book_id, borrower_id, loan_date)
The Loans table has a foreign key book_id
that references the primary key of the Books table. Similarly, it has a foreign key borrower_id
that references the primary key of the Borrowers table. Referential integrity ensures that a loan record can't exist for a non-existent book or a borrower who isn't registered.
Thus, in relational databases with the right set of constraints, even if you want to insert bad and inconsistent data in the database, you won’t be able to, because of referential integrity.
Talking about document databases, since they offer schema flexibility, that’s why data validation and integrity constraints may need to be enforced at the application level. Developers can implement data validation logic within their application code to ensure that only valid data is inserted or updated in the database.
3. Consistency
Relational or SQL databases are designed to support ACID properties in the transactions. Soon, when you have too much data in your SQL database, you will want to scale your database either using vertical scaling or horizontal scaling.
Now, as you know, vertical scaling means adding more resources to a single machine which eventually becomes costly. We are left with one option which is horizontal scaling but that is hard to implement in SQL databases. Here’s why — Supporting ACID properties across multiple nodes in a distributed system without affecting high latency is a difficult challenge.
To explain this better, quoting a text from this article by DesignGuru,
Example of a transaction spanning multiple nodes:
“Imagine a banking system with accounts distributed across different shards. If a customer wants to transfer money from one account to another, and each account is on a different shard, the transaction must be atomic across all involved shards. If the debit operation succeeds on one shard but the credit operation fails on another due to a node failure, the system must roll back the successful operation to maintain atomicity. Achieving this without incurring high latency and ensuring durability if a node goes down mid-transaction is challenging.”
Thus, in a nutshell, Relational databases always provide strong data consistency.
On the other hand, Document databases often favor the BASE model (Basically Available, Soft state, Eventual consistency). This prioritizes availability over strict consistency. Data modifications are made available quickly, but it might take some time for all replicas to reflect the latest changes.
Instead of pessimistic locking, document databases often use optimistic concurrency control. This allows concurrent modifications to happen, but during write operations, conflicts are detected and resolved based on versioning or timestamps.
Ultimately, document databases aim for eventual consistency. This means that while there might be a temporary lag between writes and reads being reflected across all replicas, eventually, all copies will converge to the same state.
4. Relationships
“Case Study from DDIA (Designing data-intensive applications) book”:
Martin Kleppmann, the author outlines a great comparison b/w Relational and Document-based databases.
Martin picks an example of storing a resume in a database (relational and document databases) and then makes a comparison between them.
A resume contains the following information:
- User personal info: firstName, lastName, summary
- education (college, start, end)
- positions (company, position, start, end)
- contact (type, URL)
Relational database (like MySQL, PostgreSQL):
- SCHEMA: it's easier & intuitive to define relationships using rows, columns, and foreign keys in SQL
- JOINS: multiple joins are required to fetch all relevant information of a user and can be fetched in a single query
Document-based database (like MongoDB, CouchDB, etc) :
- SCHEMA: no restriction in defining schema, everything is just a JSON object
- JOINS: since only one-to-many relationships are stored in one JSON object, all information might or might not be available in one single document query
The point to note here is One-to-many, many-to-one, and many-to-many relationships are easy to accommodate in relational databases, but many-to-one or many-to-many relationships are hard to support in document-based. We need to write code in the application to support database join operations and query document-based databases multiple times because document-based are essentially hierarchical (or tree-based in nature) where every node has only one parent.
Conclusion
How we structure our data is super critical because it would then decide: how many queries are needed to get the desired data, read and write latency, what indexes we can build, future scope to store 10x data, etc. The conclusion here is:
Go relational if:
Your data structure is well-defined and unlikely to change significantly.
Complex relationships between data are crucial for your application.
Data integrity and strict enforcement of data types are essential.
For example: preparing a library book management system as discussed above finds a good fit in the relational database due to a need to store always valid & consistent data
Choose document-based if:
You anticipate frequent changes to your data structure.
Flexibility and ease of scaling are top priorities.
Storing large amounts of unstructured or semi-structured data is necessary.
For example: storing a movie’s metadata which can be accessed directly and has no other complex relationships with other entities like movie name, description, cast, rating, duration, etc. — all these fields can be stored in a single JSON document object.
Remember, it's not always an either/or situation. Some applications might benefit from a hybrid approach using both relational and document databases for different purposes.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Book exclusive 1:1 with me here.
Resources
Relational vs Document database by Atlan
Document store vs relational database by CockroachLabs
Relational vs non-relational databases by MongoDB
Horizontally Scale SQL databases by Design Guru