Python字典实现
这篇文章描述了在Python中字典是如何实现的。 字典通过键( >>> d = {‘a‘: 1,‘b‘: 2} >>> d[‘c‘] = 3 >>> d {‘a‘: 1,‘b‘: 2,‘c‘: 3}
可以这样访问字典值: >>> d[‘a‘] 1 >>> d[‘b‘] 2 >>> d[‘c‘] 3 >>> d[‘d‘] Traceback (most recent call last): File "<stdin>",line 1,in <module> KeyError: ‘d‘
键‘d‘不存在,所以抛出了KeyError异常。 哈希表Python字典是用哈希表( >>> map(hash,(0,1,2,3)) [0,3] >>> map(hash,("namea","nameb","namec","named")) [-1658398457,-1658398460,-1658398459,-1658398462]
在这篇文章的余下部分里,我们假定用字符串作为字典的键。Python中字符串的哈希函数定义如下: arguments: string object return: hash function: string_hash: if hash cached: return it set len to string‘s length initialize var p ponting to 1st char of string object set x to value pointed by p left shifted by 7 bits while len >= 0: set var x to (1000003 * x) xor value pointed by p increment pointer p set x to x xor length of string object cached x as the hash so we don‘t need to calculate it again return x as the hash 如果你在Python中运行 hash(‘a‘),string_hash()函数将会被执行并返回12416037344。这里我们假定我们使用的时64位机器。(译者注:这篇文章讲述的是Python2的字典实现) 如果一个长度为x的数组被用来存储键/值对,那么我们用 我们可以看到当键是连续的时候,Python的哈希函数能很好地运作,因为这类数据极为常见。然而,当我们添加了键 我们可以使用一个 开放地址法开放地址法是一个用探测手段来解决冲突的方法。在上述 二次探查序列是用来查找空闲slot的,代码如下: j = (5 * j) + 1 + perturb; perturb >>= PERTURB_SHIFT; use j % 2 ** i as the next table index; 你可以在源代码dictobject.c中找到更多关于探测序列的内容。详细的探测机制的解释在源码的头部。 现在让我们顺着一个例子来看看Python实现的源码。 字典的C数据结构下面的C结构被用来存储一个字典项:键/值对。哈希值、键和值都存储在这个结构中。 typedef struct { Py_ssize_t me_hash; PyObject *me_key; PyObject *me_value; } PyDictEntry;
下面的结构代表了一个字典。 typedef struct _dictobject PyDictObject; struct _dictobject { PyObject_HEAD Py_ssize_t ma_fill; Py_ssize_t ma_used; Py_ssize_t ma_mask; PyDictEntry *ma_table; PyDictEntry *(*ma_lookup)(PyDictObject *mp,PyObject *key,long hash); PyDictEntry ma_smalltable[PyDict_MINSIZE]; };
字典的初始化当你第一次创建一个字典, returns new dictionary object function PyDict_New: allocate new dictionary object clear dictionary‘s table set dictionary‘s number of used slots + dummy slots (ma_fill) to 0 set dictionary‘s number of active slots (ma_used) to 0 set dictionary‘s mask (ma_value) to dictionary size - 1 = 7 set dictionary‘s lookup function to lookdict_string return allocated dictionary object 添加项当添加一个新键值对时 为什么事2/3?这是为了保证探测数列可以快速地找到可用的空slot(译者注:个人理解是如果空闲量较大,那么每次探测产生冲突的概率就会降低,从而减少探测次数)。稍候我们再看调整大小的函数。 arguments: dictionary,key,value return: 0 if OK or -1 function PyDict_SetItem: if key‘s hash cached: use hash else: calculate hash call insertdict with dictionary object,hash and value if key/value pair added successfully and capacity orver 2/3: call dictresize to resize dictionary‘s table
如果我们想添加这样的一些键/值对:{‘a‘: 1,‘b‘: 2,‘z‘: 26,‘y‘: 25,‘c‘: 5,‘x‘: 24},情形如下: 一个字典结构里的表大小为8。 PyDict_SetItem: key=‘a‘,value = 1 hash = hash(‘a‘) = 12416037344 insertdict lookdict_string slot index = hash & mask = 12416037344 & 7 = 0 slot 0 is not used so return it init entry at index 0 with key,value and hash ma_used = 1,ma_fill = 1 PyDict_SetItem: key=‘b‘,value = 2 hash = hash(‘b‘) = 12544037731 insertdict lookdict_string slot index = hash & mask = 12544037731 & 7 = 3 slot 3 is not used so return it init entry at index 3 with key,value and hash ma_used = 2,ma_fill = 2 PyDict_SetItem: key=‘z‘,value = 26 hash = hash(‘z‘) = 15616046971 insertdict lookdict_string slot index = hash & mask = 15616046971 & 7 = 3 slot 3 is used so probe for a different slot: 5 is free init entry at index 5 with key,value and hash ma_used = 3,ma_fill = 3 PyDict_SetItem: key=‘y‘,value = 25 hash = hash(‘y‘) = 15488046584 insertdict lookdict_string slot index = hash & mask = 15488046584 & 7 = 0 slot 0 is used so probe for a different slot: 1 is free init entry at index 1 with key,value and hash ma_used = 4,ma_fill = 4 PyDict_SetItem: key=‘c‘,value = 3 hash = hash(‘c‘) = 12672038114 insertdict lookdict_string slot index = hash & mask = 12672038114 & 7 = 2 slot 2 is not used so return it init entry at index 2 with key,value and hash ma_used = 5,ma_fill = 5 PyDict_SetItem: key=‘x‘,value = 24 hash = hash(‘x‘) = 15360046201 insertdict lookdict_string slot index = hash & mask = 15360046201 & 7 = 1 slot 1 is used so probe for a different slot: 7 is free init entry at index 7 with key,value and hash ma_used = 6,ma_fill = 6 目前为止,一切看起来像这样: 8个slots中的6个已经被使用了,这超过了数组容量的2/3, 在这里, 新表的大小需要大于24,这是通过将当前表大小向左移位来实现的,每次左移一位直到它大于24,结果表大小变为了32(即8 -> 16 -> 32)。 这就是重新调整表大小的结果:分配了一个大小为32的新表,旧表的项利用新掩码31(即32 - 1)计算哈希从而插到新表的相应slots中。结果看起来像这样: 移除项
假设我们想从字典中移除键‘c‘,最终我们得到下述数组: 注意删除字典项的操作不会触发数组大小的调整(即使使用slots的数量远小于总slots量)。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |