Elasticsearch文档分析器

分析器测试

根据使用的文档分析器和测试的文本，给出分词结果

GET http://localhost:9200/_analyze

{
    "analyzer": "standard",
    "text": "Text to analyze"
}

标准分词器standard，适用于英文分词，不适用于中文，对中文只进行单字切分 IK分词器ik_max_word，对中文进行细粒度的切分，会对结果再继续尝试切分，直到不能切分为止。例如“中华人民共和国国歌”将切分为“中华人民共和国”、“中华人民”、“中华”、“华人”、“人民共和国”、“人民”、“人”、“民”、“共和国”、“共和”、“和”、“国国”、“国歌” IK分词器ik_smart，对中文进行粗粒度的切分，。例如“中华人民共和国国歌”将切分为“中华人民共和国”、“国歌”

IK分词器安装

下载对应版本的IK分词器 https://release.infinilabs.com/ https://github.com/infinilabs/analysis-ik/releases

在Elasticsearch的插件文件夹plugins下新建一个ik文件夹，将下载下来的zip放到里面并解压，重启Elasticsearch后即可使用

当找不到Elasticsearch对应版本的IK分词器时，可修改IK分词器插件的plugin-descriptor.properties文件中的对应版本

elasticsearch.version=7.17.9

IK分词器扩展词汇

修改plugins/ik/config/IKAnalyzer.cfg.xml

<properties>
    <entry key="ext_dict">custom.dic</entry>
</properties>

创建plugins/ik/config/custom.dic，增加新词

IK分词器配置使用

全局配置，修改config/elasticsearch.yml，增加一行

index.analysis.analyzer.default.type: ik

通过mapping指定

PUT http://localhost:9200/user/_mapping

{
    "properties": {
        "name": {
            "type": "text",
            "index": true,
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_max_word"
        },
        "sex": {
            "type": "keyword",
            "index": true
        },
        "tel": {
            "type": "keyword",
            "index": false
        }
    }
}

PREVIOUSPython基本语法

NEXTElasticsearch部署配置