【ElasticSearch-基础篇】ES高级查询Query DSL全文检索_ar

query dsl之全文检索

什么是全文检索
一、数据准备
二、match query
三、multi_match query
四、match_phrase query
五、query_string query
六、simple_query_string

什么是全文检索

和术语级别查询（term-level queries）不同，全文检索查询（full text queries）旨在基于相关性搜索和匹配文本数据。这些查询会对输入的文本进行分析，将其拆分为词项（单个单词），并执行诸如分词、词干处理和标准化等操作。

全文检索的关键特点：

对输入的文本进行分析，并根据分析后的词项进行搜索和匹配。全文检索查询会对输入的文本进行分析，将其拆分为词项，并基于这些词项进行搜索和匹配操作。
以相关性为基础进行搜索和匹配。全文检索查询使用相关性算法来确定文档与查询的匹配程度，并按照相关性进行排序。相关性可以基于词项的频率、权重和其他因素来计算。
全文检索查询适用于包含自由文本数据的字段，例如文档的内容、文章的正文或产品描述等。

一、数据准备

put full_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "description" : {
          "type" : "text",
          "analyzer": "ik_max_word",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
    }
  }
}

测试数据如下:
{name=张三, description=北京故宫圆明园, age=11}
{name=王五, description=南京总统府, age=15}
{name=李四, description=北京市天安门广场, age=18}
{name=富贵, description=南京市中山陵, age=22}
{name=来福, description=山东济南趵突泉, age=8}
{name=憨憨, description=安徽黄山九华山, age=27}
{name=小七, description=上海东方明珠, age=31}

二、match query

匹配查询: match在匹配时会对所查找的关键词进行分词，然后按分词匹配查找。

match支持以下参数：

query : 指定匹配的值
operator : 匹配条件类型
and : 条件分词后都要匹配
or : 条件分词后有一个匹配即可(默认)
minmum_should_match : 最低匹配度，即条件在倒排索引中最低的匹配度

dsl: 索引description字段包含 “南京总统府” 的数据

get  full_index/_search
{
  "query": {
    "match": {
      "description": "南京总统府"
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2667978,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.2667978,
        "_source" : {
          "name" : "王五",
          "age" : 15,
          "description" : "南京总统府"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0751815,
        "_source" : {
          "name" : "富贵",
          "age" : 22,
          "description" : "南京市中山陵"
        }
      }
    ]
  }
}

springboot实现:

    private final static logger logger = loggerfactory.getlogger(fulltextquery.class);

    private static final string index_name = "full_index";

    @resource
    private resthighlevelclient client;
    
    @requestmapping(value = "/match_query", method = requestmethod.get)
    @apioperation(value = "dsl - match_query")
    public void match_query() throws exception {
        // 定义请求对象
        searchrequest searchrequest = new searchrequest(index_name);
        // 查询所有
        searchrequest.source(new searchsourcebuilder().query(querybuilders.matchquery("description","南京总统府")));
        // 打印返回数据
        printlog(client.search(searchrequest, requestoptions.default));
    }

    private void printlog(searchresponse searchresponse) {
        searchhits hits = searchresponse.gethits();
        system.out.println("返回hits数组长度:" + hits.gethits().length);
        for (searchhit hit: hits.gethits()) {
            system.out.println(hit.getsourceasmap().tostring());
        }
    }
    
返回结果如下:
返回hits数组长度:2
{name=王五, description=南京总统府, age=15}
{name=富贵, description=南京市中山陵, age=22}

post _analyze
{
  "analyzer": "ik_max_word",
  "text": ["南京总统府"]
}

{
  "tokens" : [
    {
      "token" : "南京",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "cn_word",
      "position" : 0
    },
    {
      "token" : "总统府",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "cn_word",
      "position" : 1
    },
    {
      "token" : "总统",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "cn_word",
      "position" : 2
    },
    {
      "token" : "府",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "cn_char",
      "position" : 3
    }
  ]
}

比如此时我们再插入一条数据:
post /full_index/_bulk
{"index":{"_id":8}}
{"name":"张三","age":11,"description":"南京总统"}

当我们搜索:"南京总统",可以搜到两条数据
get  full_index/_search
{
  "query": {
    "match": {
      "description": {
        "query": "南京总统",
        "operator": "and"
      }
    }
  }
}
数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.898355,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 2.898355,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "南京总统"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.35562,
        "_source" : {
          "name" : "王五",
          "age" : 15,
          "description" : "南京总统府"
        }
      }
    ]
  }
}

但是当搜索:"南京总统府"时，只能搜索到一条数据,就是因为分词时，有一个词项"府",在其中一条数据中不存在

三、multi_match query

多字段查询:可以根据字段类型，决定是否使用分词查询，得分最高的在前面
注意：字段类型分词,将查询条件分词之后进行查询，如果该字段不分词就会将查询条件作为整体进行查询。

dsl: 查询 “name” 或者 “description” 这两个字段中出现 “北京王五” 词汇的数据

get  full_index/_search
{
  "query": {
    "multi_match": {
      "query": "北京王五",
      "fields": ["name","description"]
    }
  }
}

返回结果如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 3.583519,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.583519,
        "_source" : {
          "name" : "王五",
          "age" : 15,
          "description" : "南京总统府"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.4959542,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.98645234,
        "_source" : {
          "name" : "李四",
          "age" : 18,
          "description" : "北京市天安门广场"
        }
      }
    ]
  }
}

springboot实现:

    @requestmapping(value = "/multi_match", method = requestmethod.get)
    @apioperation(value = "dsl - multi_match")
    public void multi_match() throws exception {
        // 定义请求对象
        searchrequest searchrequest = new searchrequest(index_name);
        // 查询所有
        searchrequest.source(new searchsourcebuilder().query(
                querybuilders.multimatchquery("北京王五", new string[]{"name","description"})));
        // 打印返回数据
        printlog(client.search(searchrequest, requestoptions.default));
    }

查询结果如下:
返回hits数组长度:3
{name=王五, description=南京总统府, age=15}
{name=张三, description=北京故宫圆明园, age=11}
{name=李四, description=北京市天安门广场, age=18}

get  full_index/_search
{
  "query": {
    "multi_match": {
      "query": "北京王五",
      "fields": ["name","description.keyword"]
    }
  }
}
返回结果如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 3.583519,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.583519,
        "_source" : {
          "name" : "王五",
          "age" : 15,
          "description" : "南京总统府"
        }
      }
    ]
  }
}

四、match_phrase query

短语搜索(match phrase)会对搜索文本进行文本分析，然后到索引中寻找搜索的每个分词并要求分词相邻，你可以通过调整slop参数设置分词出现的最大间隔距离。match_phrase 会将检索关键词分词。

dsl: 搜索 "description " 字段有 “北京故宫” 的数据

get  full_index/_search
{
  "query": {
    "match_phrase": {
      "description": {
        "query": "北京故宫"
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 3.5884824,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 3.5884824,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      }
    ]
  }
}

springboot实现:

    @requestmapping(value = "/match_phrase", method = requestmethod.get)
    @apioperation(value = "dsl - match_phrase")
    public void match_phrase() throws exception {
        // 定义请求对象
        searchrequest searchrequest = new searchrequest(index_name);
        // 查询所有
        searchrequest.source(new searchsourcebuilder().query(
                querybuilders.matchphrasequery("description","北京故宫")));
        // 打印返回数据
        printlog(client.search(searchrequest, requestoptions.default));
    }

返回数据如下:
返回hits数组长度:1
{name=张三, description=北京故宫圆明园, age=11}

get  full_index/_search
{
  "query": {
    "match_phrase": {
      "description": {
        "query": "北京圆明园"
      }
    }
  }
}
返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

post _analyze
{
  "analyzer": "ik_max_word",
  "text": ["北京故宫圆明园"]
}

{
  "tokens" : [
    {
      "token" : "北京",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "cn_word",
      "position" : 0
    },
    {
      "token" : "故宫",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "cn_word",
      "position" : 1
    },
    {
      "token" : "圆明园",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "cn_word",
      "position" : 2
    }
  ]
}

get  full_index/_search
{
  "query": {
    "match_phrase": {
      "description": {
        "query": "北京圆明园",
        "slop": 1
      }
    }
  }
}
返回结果如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.4425511,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.4425511,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      }
    ]
  }
}

五、query_string query

允许我们在单个查询字符串中指定and | or | not条件，同时也和 multi_match query 一样，支持多字段搜索。和match类似，但是match需要指定字段名，query_string是在所有字段中搜索，范围更广泛。
注意: 查询字段分词就将查询条件分词查询，查询字段不分词将查询条件不分词查询

dsl: 搜索当前索引所有字段中含有 “北京故宫” 的文档

get  full_index/_search
{
  "query": {
    "query_string": {
      "query": "安徽张三"
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 2.5618675,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "南京总统"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.7342355,
        "_source" : {
          "name" : "憨憨",
          "age" : 27,
          "description" : "安徽黄山九华山"
        }
      }
    ]
  }
}

springboot实现：

    @requestmapping(value = "/query_string", method = requestmethod.get)
    @apioperation(value = "dsl - query_string")
    public void query_string() throws exception {
        // 定义请求对象
        searchrequest searchrequest = new searchrequest(index_name);
        // 查询所有
        searchrequest.source(new searchsourcebuilder().query(
                querybuilders.querystringquery("安徽张三")));
        // 打印返回数据
        printlog(client.search(searchrequest, requestoptions.default));
    }

返回hits数组长度:3
{name=张三, description=北京故宫圆明园, age=11}
{name=张三, description=南京总统, age=11}
{name=憨憨, description=安徽黄山九华山, age=27}

指定字段查询: “description” 字段中含有 “安徽张三” 的文档

get  full_index/_search
{
  "query": {
    "query_string": {
      "query": "安徽张三",
      "fields": ["description"]
    }
  }
}

返回数据如下：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.7342355,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.7342355,
        "_source" : {
          "name" : "憨憨",
          "age" : 27,
          "description" : "安徽黄山九华山"
        }
      }
    ]
  }
}

指定多个字段查询： 查询 “安徽” “憨憨” 同时满足

get  full_index/_search
{
  "query": {
    "query_string": {
      "query": "安徽 and 憨憨",
      "fields": ["description","name"]
    }
  }
}

返回:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.6615744,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 6.6615744,
        "_source" : {
          "name" : "憨憨",
          "age" : 27,
          "description" : "安徽黄山九华山"
        }
      }
    ]
  }
}

get  full_index/_search
{
  "query": {
    "query_string": {
      "query": "(安徽 and 憨憨)or 张三",
      "fields": ["description","name"]
    }
  }
}
返回数据如下:
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 6.6615744,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 6.6615744,
        "_source" : {
          "name" : "憨憨",
          "age" : 27,
          "description" : "安徽黄山九华山"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "南京总统"
        }
      }
    ]
  }
}

六、simple_query_string

类似query string，但是会忽略错误的语法,同时只支持部分查询语法，不支持and or not，会当作字符串处理。支持部分逻辑：

“+” 替代 “and”
“|” 替代 “or”
“-” 替代 “not”

get full_index/_search
{
  "query": {
    "simple_query_string": {
      "query": "(安徽 + 憨憨) | 张三",
      "fields": ["description","name"]
    }
  }
}

返回结果如下:
{
  "took" : 41,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 6.6615744,
    "hits" : [
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 6.6615744,
        "_source" : {
          "name" : "憨憨",
          "age" : 27,
          "description" : "安徽黄山九华山"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "北京故宫圆明园"
        }
      },
      {
        "_index" : "full_index",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 2.5618675,
        "_source" : {
          "name" : "张三",
          "age" : 11,
          "description" : "南京总统"
        }
      }
    ]
  }
}

【ElasticSearch-基础篇】ES高级查询Query DSL全文检索

2024年08月03日 • ar •我要评论

query dsl之全文检索

什么是全文检索

一、数据准备

二、match query

三、multi_match query

四、match_phrase query

五、query_string query

六、simple_query_string

相关文章:

在SOLIDWORKS中如何打开一个stl/obj/off/ply/ply2,3mf,step等不同3d格式文件并进行更改？

echarts 3D示例 echart, echarts-gl

网络协议格式 | 以太网帧、ARP数据报、IP数据报、UDP数据报、TCP数据报

发表评论


验证码：