DJBX33A (Daniel J. Bernstein, Times 33 with Addition) APR哈

发布时间：2020-12-15 20:51:40 所属栏目：大数据来源：网络整理

导读：经典是经过了时间考验的 APR_DECLARE_NONSTD(unsigned? int )?apr_hashfunc_default( const ? char ?*char_key,?? ??????????????????????????????????????????????????????apr_ssize_t?*klen)?? {?? ????unsigned? int ?hash?=?0;?? ???? const ?unsigned?

经典是经过了时间考验的

APR_DECLARE_NONSTD(unsigned? int )?apr_hashfunc_default( const ? char ?*char_key,??
??????????????????????????????????????????????????????apr_ssize_t?*klen)??
{??
????unsigned?int ?hash?=?0;??
????const ?unsigned? char ?*key?=?( const ?unsigned? char ?*)char_key;??
????const ?unsigned? char ?*p;??
????apr_ssize_t?i;??
??????
????/* ?
?????*?This?is?the?popular?`times?33'?hash?algorithm?which?is?used?by ?
?????*?perl?and?also?appears?in?Berkeley?DB.?This?is?one?of?the?best ?
?????*?known?hash?functions?for?strings?because?it?is?both?computed ?
?????*?very?fast?and?distributes?very?well. ?
?????* ?
?????*?The?originator?may?be?Dan?Bernstein?but?the?code?in?Berkeley?DB ?
?????*?cites?Chris?Torek?as?the?source.?The?best?citation?I?have?found ?
?????*?is?"Chris?Torek,?Hash?function?for?text?in?C,?Usenet?message ?
?????*?<27038@mimsy.umd.edu>?in?comp.lang.c?,?October,?1990."?in?Rich ?
?????*?Salz's?USENIX?1992?paper?about?INN?which?can?be?found?at ?
?????*?<http://citeseer.nj.nec.com/salz92internetnews.html>. ?
?????* ?
?????*?The?magic?of?number?33,?i.e.?why?it?works?better?than?many?other ?
?????*?constants,?prime?or?not,?has?never?been?adequately?explained?by ?
?????*?anyone.?So?I?try?an?explanation:?if?one?experimentally?tests?all ?
?????*?multipliers?between?1?and?256?(as?I?did?while?writing?a?low-level ?
?????*?data?structure?library?some?time?ago)?one?detects?that?even ?
?????*?numbers?are?not?useable?at?all.?The?remaining?128?odd?numbers ?
?????*?(except?for?the?number?1)?work?more?or?less?all?equally?well. ?
?????*?They?all?distribute?in?an?acceptable?way?and?this?way?fill?a?hash ?
?????*?table?with?an?average?percent?of?approx.?86%. ?
?????* ?
?????*?If?one?compares?the?chi^2?values?of?the?variants?(see ?
?????*?Bob?Jenkins?``Hashing?Frequently?Asked?Questions''?at ?
?????*?http://burtleburtle.net/bob/hash/hashfaq.html?for?a?description ?
?????*?of?chi^2),?the?number?33?not?even?has?the?best?value.?But?the ?
?????*?number?33?and?a?few?other?equally?good?numbers?like?17,?31,?63, ?
?????*?127?and?129?have?nevertheless?a?great?advantage?to?the?remaining ?
?????*?numbers?in?the?large?set?of?possible?multipliers:?their?multiply ?
?????*?operation?can?be?replaced?by?a?faster?operation?based?on?just?one ?
?????*?shift?plus?either?a?single?addition?or?subtraction?operation.?And ?
?????*?because?a?hash?function?has?to?both?distribute?good?_and_?has?to ?
?????*?be?very?fast?to?compute,?those?few?numbers?should?be?preferred. ?
?????* ?
?????*??????????????????--?Ralf?S.?Engelschall?<rse@engelschall.com> ?
?????*/ ??
???????
????if ?(*klen?==?APR_HASH_KEY_STRING)?{??
????????for ?(p?=?key;?*p;?p++)?{??
????????????hash?=?hash?*?33?+?*p;??
????????}??
????????*klen?=?p?-?key;??
????}??
????else ?{??
????????for ?(p?=?key,?i?=?*klen;?i;?i--,?p++)?{??
????????????hash?=?hash?*?33?+?*p;??
????????}??
????}??
????return ?hash;??
}??

APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,apr_ssize_t *klen) { unsigned int hash = 0; const unsigned char *key = (const unsigned char *)char_key; const unsigned char *p; apr_ssize_t i; /* * This is the popular `times 33' hash algorithm which is used by * perl and also appears in Berkeley DB. This is one of the best * known hash functions for strings because it is both computed * very fast and distributes very well. * * The originator may be Dan Bernstein but the code in Berkeley DB * cites Chris Torek as the source. The best citation I have found * is "Chris Torek,Hash function for text in C,Usenet message * <27038@mimsy.umd.edu> in comp.lang.c,October,1990." in Rich * Salz's USENIX 1992 paper about INN which can be found at * <http://citeseer.nj.nec.com/salz92internetnews.html>. * * The magic of number 33,i.e. why it works better than many other * constants,prime or not,has never been adequately explained by * anyone. So I try an explanation: if one experimentally tests all * multipliers between 1 and 256 (as I did while writing a low-level * data structure library some time ago) one detects that even * numbers are not useable at all. The remaining 128 odd numbers * (except for the number 1) work more or less all equally well. * They all distribute in an acceptable way and this way fill a hash * table with an average percent of approx. 86%. * * If one compares the chi^2 values of the variants (see * Bob Jenkins ``Hashing Frequently Asked Questions'' at * http://burtleburtle.net/bob/hash/hashfaq.html for a description * of chi^2),the number 33 not even has the best value. But the * number 33 and a few other equally good numbers like 17,31,63,* 127 and 129 have nevertheless a great advantage to the remaining * numbers in the large set of possible multipliers: their multiply * operation can be replaced by a faster operation based on just one * shift plus either a single addition or subtraction operation. And * because a hash function has to both distribute good _and_ has to * be very fast to compute,those few numbers should be preferred. * * -- Ralf S. Engelschall <rse@engelschall.com> */ if (*klen == APR_HASH_KEY_STRING) { for (p = key; *p; p++) { hash = hash * 33 + *p; } *klen = p - key; } else { for (p = key,i = *klen; i; i--,p++) { hash = hash * 33 + *p; } } return hash; }

对函数注释部分的翻译:
这是很出名的times33哈希算法,此算法被perl语言采用并在Berkeley DB中出现.它是已知的最好的哈希算法之一,在处理以字符串为键值的哈希时,有着极快的计算效率和很好哈希分布.最早提出这个算法的是Dan Bernstein,但是源代码确实由Clris Torek在Berkeley DB出实作的.我找到的最确切的引文中这样说"Chris Torek,C语言文本哈希函数,Usenet消息<<27038@mimsy.umd.edu> in comp.lang.c,1990年十月."在Rich Salz于1992年在USENIX报上发表的讨论INN的文章中提到.这篇文章可以在<http://citeseer.nj.nec.com /salz92internetnews.html>上找到.

33这个奇妙的数字,为什么它能够比其他数值效果更好呢?无论重要与否,却从来没有人能够充分说明其中的原因.因此在这里,我来试着解释一下.如果某人试着测试1到256之间的每个数字(就像我前段时间写的一个底层数据结构库那样),他会发现,没有哪一个数字的表现是特别突出的.其中的128个奇数(1除外)的表现都差不多,都能够达到一个能接受的哈希分布,平均分布率大概是86%.

如果比较这128个奇数中的方差值(gibbon:统计术语,表示随机变量与它的数学期望之间的平均偏离程度)的话(见Bob Jenkins的<哈希常见疑问>http://burtleburtle.net/bob/hash/hashfaq.html,中对平方差的描述),数字33并不是表现最好的一个.(gibbon:这里按照我的理解,照常理,应该是方差越小稳定,但是由于这里不清楚作者方差的计算公式,以及在哈希离散表,是不是离散度越大越好,所以不得而知这里的表现好是指方差值大还是指方差值小),但是数字33以及其他一些同样好的数字比如 17,127和129对于其他剩下的数字,在面对大量的哈希运算时,仍然有一个大大的优势,就是这些数字能够将乘法用位运算配合加减法来替换,这样的运算速度会提高.毕竟一个好的哈希算法要求既有好的分布,也要有高的计算速度,能同时达到这两点的数字很少.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!