Java HashMap Internals
This article deep dives into the internals of a Java HashMap and how does it work internally
Java HashMap is a data structure that stores <key, value> form of data. A <key, value> pair can be considered where you want to store some Value data against some Key data.
For example: let’s say you’re building your social media platform where everyone has a unique username and you want some data structure to store <username, email> mapping and fetch you the same information as well. Then, HashMap is a perfect use case here.
You can also consider storing objects in place of <Key, Value>. For instance, to support an employee-manager relationship where each employee has a single manager, you can use a HashMap where the Key is of the Employee class type and Value is of the Manager class type.
To put it simply, there are mainly two operations to expect:
Put(K, V): This operation sets the key K to the value V in the HashMap
Get(V): This operation returns the value V from the Hashmap.
Let’s look deeper to understand what the HashMap internally looks like.
Data Storage
Internally, HashMap is an array where each array index contains a LinkedList. A LinkedList is a linear collection of data nodes. In HashMap, each node of the LinkedList contains four values:
Key: represents the key from <key, value> pair
Value: represents the value from <key, value> pair
Hash: represents the hashcode of the input key
Next: represents the next pointer pointing to the next node of the LinkedList
The following image represents the HashMap data structure containing four <key, value> pairs as a representation of <username, email>
If you notice, most of the buckets out of 16 size(0-indexed) are pointing to null meaning that there is no data stored at those indexes. Also note that at index 3, there are two nodes stored at the index. This is primarily because of the Hash collisions meaning that Hash(“iamrohan“) mod 16 = Hash(“itspriyanka“) mod 16 = 3
Let’s take a closer look at the official Java implementation now!
As discussed earlier, HashMap is an array of Node<Key, Value> pairs. This HashMap array is referenced as a `table` inside the official Java documentation.
Each index of the `table` array stores a LinkedList. Let’s understand the data storage better by knowing what happens for Put(key, value) and Get(key) operations.
For Put(key, value) operation,
Firstly, the hashCode of the key object is calculated using the hashing technique.
Then, the “HashCode & N” using Bitwise AND operator(&) is calculated to get the index for insertion of the new <key, value> node. Here, N is the size of the HashMap at the time of insertion. Let’s call “HashCode & N” = ith index
If there is no Node already present at the ith index, then New Node<Key, Value> is inserted at this index representing the head of the LinkedList.
If there is already an existing node, then we traverse till the end of LinkedList using the first node as the head of LinkedList and then insert the new <Key, Value> node as a new node at the last of the LinkedList.
For Get(key) operation,
Firstly, the hashCode of the key object is calculated.
HashCode & N is calculated to get the ith index where the key might have been stored at the time of insertion.
The LinkedList stored at the ith index of the table is traversed linearly. In the first index where the Node’s key is equal to the incoming Key in the Get request, the value of that node is returned as the answer.
If no node is found where the key matches, then NULL is returned.
Well, that’s how the Put(K, V) and Get(K) functions are performed in the Java Hashmap. Let’s take a look at some other important concepts of the HashMap.
Please note the change which happened in Java 8 release where they changed the implementation from LinkedList to Balanced trees when the hash collisions cross a certain threshold.
As per the official release: “The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).”
Please refer to this stackoverflow answer for more clear explanation.
Resizing
When we initialize a HashMap in Java, then the default initial capacity of the HashMap is 16 as per the official documentation.
This means that there’s an empty array initialized of length 16. As more <key, value> are added to the HashMap, the HashMap starts facing more frequent collisions which means storing more <key, value> pairs at the same index. The con of more collisions at the same index is that we will have to linearly traverse the LinkedList in O(n) time complexity to find the desired value.
To summarize, more elements mean more collisions which degrades the time complexity of the Get(key) function. Thus, there is a need for resizing which is done after reaching a certain THRESHOLD (which is by default 75% of the hashmap capacity).
The threshold after which resizing of all the elements is required is called the Load factor of the HashMap.
Concurrency
Java Hashmap is not thread-safe and it must be synchronized externally using the synchronized keyword for a use case where multiple threads trying to modify the same map. ConcurrentHashMap is an alternative to using a hashmap in Java if you don’t wish to do thread-safe handling by yourself.
Equals and HashCode
The equals and hashcode contract in Java Hashmaps boils down to this:
Equal objects have the same hashcode: If two objects are considered equal by "equals()", they must always return the same integer value from "hashCode()".
Unequal objects can share a hashcode: Different objects can return the same hashcode, but equal objects must always have the same one.
Why is this important? Hashmaps use hashcodes to group similar objects, so equal objects should land in the same bucket for efficient retrieval. Violating this contract can lead to problems like:
Incorrect object retrieval: You might get the wrong object when searching for it in the HashMap.
Data loss: Equal objects might overwrite each other if they hash to the same bucket but the HashMap doesn't properly compare them using "equals()".
To avoid these issues, remember to always override both "equals()" and "hashCode()" together when using custom classes as HashMap keys.
Conclusion
Java HashMap is probably one of the most important data structures considered for Java development. Deep diving into HashMap allows us to think how much complexity has gone into building such a data structure but also allows us to prepare our own Custom HashMap if we want some more functionality on top of trivial Java HashMap implementation.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your LinkedIn network and schedule a free 30-minute 1:1 with me:) Ping me on Linkedin!
Hi,
Thanks for the detailed explanation. I had a small doubt - if the hashmap’s capacity does not exceed threshold, then it uses linkedlist for storing and fetching the values. Then how come it guarantees constant time put and get operations
I think after java 8 they have changes the internal implementation from linked list to binary tree for better complexity in get calls