solr全文检索技术学习（二）-schema.xml配置解析

发布时间：2020-12-16 08:36:24 所属栏目：百科来源：网络整理

导读：schema.xml 文件里面主要定义了索引数据类型，索引字段等信息。 2.1.fieldtype fieldtype 节点主要用来定义数据类型。 fieldTypename="string"sortMissingLast="true"class="solr.StrField"/!--booleantype:"true"or"false"-- fieldTypename="boolean"sortMis

schema.xml 文件里面主要定义了索引数据类型，索引字段等信息。

2.1.fieldtype

fieldtype 节点主要用来定义数据类型。

<fieldTypename="string"sortMissingLast="true"class="solr.StrField"/>
<!--booleantype:"true"or"false"--> 
<fieldTypename="boolean"sortMissingLast="true"class="solr.BoolField"/>

name 指定的是节点定义的名称 class 指向 org.apache.solr.analysis 中定义的类型名称 fieldtype 还可以自己定义当前类型建立索引和查询数据的时候使用的查询分析器。 tokenizer 指定分词器 filter 指定过滤器

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example,we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

positionIncrementGap：可选属性，定义在同一个文档中此类型数据的空白间隔，避免短语匹配错误。

positionIncrementGap=100 只对 multiValue=true 的 fieldType 有意义。

StrField 类型不被分析，而是被逐字地索引/存储

solr.TextField 允许用户通过分析器来定制索引和查询，分析器包括一个分词器（tokenizer）和多个过滤器（filter）

2.2.field

field 节点指定建立索引和查询数据的字段。

name 代表数据字段名称 type 代表数据类型，也就是之前定义的 fieldtype

indexed 代表是否被索引

stored 代表是否被存储

multiValued 是否有多个值，如果字段可能有多个值，尽可能设为 true

_version 节点和 root 节点是必须保留的，不能删除

<fieldname="_version_"stored="true"indexed="true"type="long"/> 
<fieldname="_root_"stored="false"indexed="true"type="string"/> 
<field name="ProductCode" stored="true" indexed="true" type="string" multiValued="false"required="true"/>
<fieldname="ProductName"stored="true"indexed="true"type="text_general"/>

2.3.copyfield

通过这个节点，可以把一个字段的值复制到另一个字段中，也可以把多个字段的值同时复制到另一个字段中，这样搜索的时候都可以根据一个字段来进行搜索。

<copyFieldsource="ProductName"dest="text"/> <copyFieldsource="ProductCode"dest="text"/>
<pre name="code" class="html">
<field name="product_keywords" type="text_general" indexed="true" stored="false" multiValued="true"/>
<copyField source="product_name" dest="product_keywords"/>
<copyField source="product_description" dest="product_keywords"/>
<copyField source="product_catalog_name" dest="product_keywords"/>

2.4.dynamicField

dynamicField 表示动态字段，可以动态定义一个字段，只要符合规则的字段都可以。
<dynamicFieldname="*_i"stored="true"indexed="true"type="int"/> *_i 只要以_i 结尾的字段都满足这个定义。

2.5.uniquekey

<uniqueKey>id</uniqueKey>
uniquekey 节点是文档的唯一标示，相当于主键，每次更新，删除的时候都根据这个字段来进行操作。必须填写。

2.6.defaultSearchField

<defaultSearchField>text</defaultSearchField> defaultSearchField 指定搜索的时候默认搜索字段的值。

2.7.solrQueryParser

<solrQueryParserdefaultOperator="OR"/> solrQueryParser 指定搜索时多个词之间的关系，可以是 or,and 两种。

2.8. 性能优化

1、将所有只用于搜索的，而不需要作为结果的 field（特别是一些比较大的 field）的 stored 设置为 false；

2、将不需要被用于搜索的，而只是作为结果返回的field的indexed设置为false；

3、删除所有不必要的 copyField 声明为了索引字段的最小化和搜索的效率；

4、将所有的 textfields 的 index 都设置成 false，然后使用 copyField 将他们都复制到一个总的 textfield 上，然后进行搜索。

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!