Saturday, September 28, 2019

Performance Improvement for HashMap in Java 8

HashMap in Java
The HashMap class extends AbstractMap and implements Map interface. HashMap is a key and value collection in java. The HashMap stores the data in key and value format. It provides the basic implementation of the Map interface of Java. We can put a value with using a key and also we can access this value using that key. It uses a hash table to store the map. This allows the execution time of the get() and the put() methods to remain the same.

The traversal of HashMap, get(), and other methods lookup time of HashMap have a negative impact due to hash collisions. This situation you can face when multiple keys end up in the same bucket, then values along with their keys are placed in a linked list. So, the retrieval time of elements from HashMap increases from O(1) to O(n). Because the linked list has to be traversed to get the entry in the worst case scenario.

But Java 8 has come with the following new strategy for HashMap objects in case of high collisions.

To address this issue, Java 8 hash elements use balanced trees instead of linked lists after a certain threshold is reached. Which means HashMap starts with storing Entry objects in a linked list but after the number of items in a hash becomes larger than a certain threshold. The hash will change from using a linked list to a balanced tree.
Above changes ensure the performance of O(log(n)) in worst case scenarios and O(1) with proper hashCode().
The alternative String hash function added in Java 7 has been removed.


Another Blog also stated :

Basically when a bucket becomes too big (currently: TREEIFY_THRESHOLD = 8), HashMap dynamically replaces it with an ad-hoc implementation of tree map.



The fix has been implemented in the classes java.util.HashMap, java.util.LinkedHashMap and java.util.concurrent.ConcurrentHashMap. No interfaces or methods specifications have been changed, only the behavior in the implementation of the concurrent hash map is different. So there is no need to change the applications using these classes. However, the iteration order when accessing hash maps entries may be different, this is explained in this article and should be reviewed in your programs.


 O(log n) performance
In this case, it means that the algorithm will perform better when the amount of data is larger. Performance is not directly proportional to the large of the processed data but in a log n relation. O(log n) performs better than O(n).

 Balanced trees
A tree is balanced if the left and the right sub trees are balanced (recursion!) and their height differ by at most one. The main goal is to keep the depths of all nodes to be O(log n). The maintenance of the balanced tree has a penalty in the insertion of new elements but improve the indexing and accessing performance.

No comments:

Post a Comment