数据结构 – 哈希表 – 使用二进制搜索树实现

发布时间：2020-12-15 04:24:37 所属栏目：Java 来源：网络整理

导读：从破解编码面试,第71页： Alternatively,we can implement hash table with a BST. We can then guarantee an O(log n) lookup time,since we can keep the tree balanced. Additionally we may use less space,since a large array no longer needs to be a

从破解编码面试,第71页：

Alternatively,we can implement hash table with a BST. We can then
guarantee an O(log n) lookup time,since we can keep the tree
balanced. Additionally we may use less space,since a large array no
longer needs to be allocated in the very beginning.

我知道链表,哈希表和BST的基础知识,但我无法理解这些行.它究竟意味着什么？这个最终的数据结构会是Trie吗？

解决方法

该部分的全文说明,最后一段是您询问的一段：

A hash table is a data structure that maps keys to values for highly efficient lookup. In a
very simple implementation of a hash table,the hash table has an underlying array and
a hash function. When you want to insert an object and its key,the hash function maps
the key to an integer,which indicates the index in the array. The object is then stored at
that index.

Typically,though,this won’t quite work right. In the above implementation,the hash
value of all possible keys must be unique,or we might accidentally overwrite data. The
array would have to be extremely large—the size of all possible keys—to prevent such
“collisions.”

Instead of making an extremely large array and storing objects at index hash (key),we
can make the array much smaller and store objects in a linked list at index hash (key) %
array_length.To get the object with a particular key,we must search the linked list for
this key.

Alternatively,we can implement the hash table with a binary search tree. We can then
guarantee an 0(log n) lookup time,since we can keep the tree balanced. Additionally,
we may use less space,since a large array no longer needs to be allocated in the very
beginning.

所以他们谈论使用BST(二叉搜索树)来处理冲突.使用BST作为唯一的数据结构实际上没有意义,因为正确调整的散列的整个点是查找大约是O(1),比O(log n)更好.一个BST.最重要的是,使用BST完全实现哈希表意味着它实际上不是哈希表:-)

但是,请注意,当您在哈希表中发生冲突时,处理它们的常用方法是让每个存储桶包含其项目的链接列表.在退化的情况下(所有项目散列到同一个桶),最终只有一个链表,O(1)变成O(n).

因此,您有一个BST,而不是每个桶的链表.然后,在单个存储桶具有多个项目(前面提到的冲突)的情况下,您不再具有O(n)搜索复杂度.

如果存在冲突,则使用散列函数在O(1)中查找存储桶,然后在O(log n)中搜索BST.在最好的情况下(每桶一个项目),它仍然是O(1).最坏的情况然后变为O(log n)而不是O(n).

我最初关心这个理论的唯一问题是,他们还讨论了不再需要大量分配的事实.如果它是共享散列/ BST组合,您仍然需要分配整个散列表,以使其看起来不协调.

但是,从上下文(“……因为不再需要分配大型数组……”)看来,它们意味着它们可以使哈希表成为双数据结构的一部分,因为冲突效率更高处理.换句话说,如果使用BST,则可以使用100个元素的哈希表,而不是具有链接列表的1000个元素哈希表,因为冲突不会对搜索时间造成太大损害.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!