sqlite中fts的数据结构说明:segment leaf nodes

发布时间：2020-12-12 20:15:07 所属栏目：百科来源：网络整理

导读：注释文件的说明， **** Segment leaf nodes **** ** Segment leaf nodes store terms and doclists,ordered by term. Leaf ** nodes are written using LeafWriter,and read using LeafReader (to ** iterate through a single leaf node's data) and Leaves

注释文件的说明，

**** Segment leaf nodes ****
** Segment leaf nodes store terms and doclists,ordered by term. Leaf
** nodes are written using LeafWriter,and read using LeafReader (to
** iterate through a single leaf node's data) and LeavesReader (to
** iterate through a segment's entire leaf layer). Leaf nodes have
** the format:
**
** varint iHeight; (height from leaf level,always 0)
** varint nTerm; (length of first term)
** char pTerm[nTerm]; (content of first term)
** varint nDoclist; (length of term's associated doclist)
** char pDoclist[nDoclist]; (content of doclist)
** array {
** (further terms are delta-encoded)
** varint nPrefix; (length of prefix shared with previous term)
** varint nSuffix; (length of unshared suffix)
** char pTermSuffix[nSuffix];(unshared suffix of next term)
** varint nDoclist; (length of term's associated doclist)
** char pDoclist[nDoclist]; (content of doclist)
** }
**

一个node描述了一些term和其相对应的doclist（这个结构的细节参考上一篇文章），基本上就是，term1+doclist1+term2+doclist2+term3+doclist3.....。

第一字节开始，为一个变长的int型数值，表示当前node在b-tree的高度。在b-tree的高度定义中，树的最底层，也就是叶子节点，定义为level 0.由于这个nodes是leaf node，所以它的height总是0.

接下来字节也是一个变长int型数值，表示第一个term有多长，接下来就是一个char数组，存储了term这个字符串的具体内容。（存储term其实就是存个字符串，一般来说，我们可以这么存：顺序写字符流，最后写个0，表示结束。但是这里没这样做，而是先存个字符串的长度，再依次存字符流。）

再接下来也是一个变长的int数值，表示doclist的字节流有多长，随后就是这么多个的字节流，表示doclist（doclist的具体解析可以参考上一篇文章）。

//-----

再往下就是存储下一个term和其对应的doclist。我们知道term是按字符串大小排过序的，所以相邻的2个term的前缀字符总是相同。存储当前term的时候，先存个数值，表示当前term的前缀有多少个字符和上一个term相同，再存个数值，表示当前term去掉前缀还有多少个字符（也就是后缀）。接着就是当前term的后缀字符串。把上一个term的前缀加上当前term的后缀，就是当前term的具体内容。

再后来基本一样，先存个变长int数值，表示doclist的长度，再存doclist的具体内容。

从第二term开始，采用的存储方式，一来可以节省很多数据空间，排过序的term，前缀相同的比例非常的高，二来代码上看，也不会有任何的性能问题，就是代码的处理还是很流畅，没有啥来回判断，顺序读取term的时候一气呵成。

看代码：

pReader->nTerm，为上一个term的长度

pReader->zTerm[ ]，这个数组为上一个term的具体内容

pNext已经指向了当前term的内容的首地址。

为了读取当前的term，

pNext += sqlite3Fts3GetVarint32(pNext,&nPrefix); ///先读取prefix的长度
pNext += sqlite3Fts3GetVarint32(pNext,&nSuffix); ///再读取suffix的长度

memcpy(&pReader->zTerm[nPrefix],pNext,nSuffix); /// zTerm已经是上一个term的内容，从perfix下标开始的地方，把当前term的suffix拷贝过来
pReader->nTerm = nPrefix+nSuffix; ///设置当前term的长度

完成！

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!