However, B+trees have a linked list of leaf nodes. Thus, you can just use one path to traverse down the whole tree and then access all the keys using links at the leaf nodes. Thus, making sequential access more efficient in case of B+Trees.
My question is:
How is the B+ tree read sequential on disc (assuming HDD)?
From your example, I believe the leaf nodes 1,3,4,5,6,10,12... are not stored in sequential pages on disc. The reason is because otherwise, as soon as you add a new node , say 7, you will have to move all pages after that (10,12...) which will become an extremely expensive operation.
When I read "sequential", I believe the context should be sequential disc read (in context of a hard disc, for SSDs, it doesn't matter).
So, even though each leaf node is pointing to the next leaf node, it will be a random disc access.
But, nevertheless, thanks for highlighting this and it makes a lot of sense. You're right. Though the access within the data structure B+Tree looks like sequential but it would actually be random access inside disk.
I would do some further investigation and edit the article.
Thanks for the great article ๐
For The section:
However, B+trees have a linked list of leaf nodes. Thus, you can just use one path to traverse down the whole tree and then access all the keys using links at the leaf nodes. Thus, making sequential access more efficient in case of B+Trees.
My question is:
How is the B+ tree read sequential on disc (assuming HDD)?
From your example, I believe the leaf nodes 1,3,4,5,6,10,12... are not stored in sequential pages on disc. The reason is because otherwise, as soon as you add a new node , say 7, you will have to move all pages after that (10,12...) which will become an extremely expensive operation.
When I read "sequential", I believe the context should be sequential disc read (in context of a hard disc, for SSDs, it doesn't matter).
So, even though each leaf node is pointing to the next leaf node, it will be a random disc access.
Hey @noobengineer,
I think you were referring to this article: https://vivekbansal.substack.com/p/btree-vs-bplus-tree-key-differences
But, nevertheless, thanks for highlighting this and it makes a lot of sense. You're right. Though the access within the data structure B+Tree looks like sequential but it would actually be random access inside disk.
I would do some further investigation and edit the article.
Once again, thankyou.
Really good read!!
Thankyou Navin ๐
Nice blog
awesome read on LSM tree.
I would like to know how can we do a binary search in one SSTable.
Since data size is different for different rows, how should we perform binary search.
My string records in SSTable will look something like this (ofcourse this will be in binary encoding, I am showing as text for simplicity):
content of SSTable
--------------------
{recordOffset: 0, keySize: 4, key: "name", valueSize: 7, value: "ashwani"}
{recordOffset: xxx, keySize: 6, key: "friend", valueSize: 5, value: "vivek"}
--------------------