加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

Java中的哈希码桶分布

发布时间:2020-12-14 05:38:13 所属栏目:Java 来源:网络整理
导读:假设我需要在Hashset中存储1000个对象,我是否有1000个桶包含每个对象(通过为每个对象生成哈希码的唯一值)或者有10个桶大致包含100个对象? 拥有唯一存储桶的一个优点是我可以在调用equals()方法时节省执行周期吗? 为什么设置数量的桶并尽可能均匀地分配对象
假设我需要在Hashset中存储1000个对象,我是否有1000个桶包含每个对象(通过为每个对象生成哈希码的唯一值)或者有10个桶大致包含100个对象?

拥有唯一存储桶的一个优点是我可以在调用equals()方法时节省执行周期吗?

为什么设置数量的桶并尽可能均匀地分配对象是非常重要的?

什么应该是理想的铲斗比率?

解决方法

Why is it important to have set number of buckets and distribute the objects amoung them as evenly as possible?

HashSet应该能够平均确定O(1)时间内的成员资格.从documentation:

This class offers constant time performance for the basic operations (add,remove,contains and size),assuming the hash function disperses the elements properly among the buckets.

Hashset用于实现此目的的算法是检索对象的哈希码并使用它来查找正确的桶.然后它迭代桶中的所有项目,直到找到相同的项目.如果存储桶中的项目数大于O(1),则查找将花费超过O(1)时间.

在最坏的情况下 – 如果所有项都散列到同一个桶 – 它将花费O(n)时间来确定对象是否在集合中.

What should be the ideal object to bucket ratio?

这里有一个时空权衡.增加桶的数量会减少碰撞的可能性.但是,它也增加了内存需求.哈希集有两个参数initialCapacity和loadFactor,允许您调整HashSet应创建的桶数.默认加载因子为0.75,这对于大多数用途来说都很好,但如果您有特殊要求,则可以选择其他值.

有关这些参数的更多信息,请参见HashMap的文档:

This implementation provides constant-time performance for the basic operations (get and put),assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus,it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table,and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity,the capacity is roughly doubled by calling the rehash method.

As a general rule,the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class,including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity,so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor,no rehash operations will ever occur.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读