倒排索引 multi-match 功能

multi-match 是一种强大的查询功能,它允许用户将一个单一的查询字符串同时在多个字段上进行搜索。这在许多实际应用场景中都非常有用,例如,当一个电商网站的用户在搜索框输入 "durable backpack" 时,系统可能需要同时在 product_title (产品标题)、description (描述) 和 category (分类) 等多个字段中查找。

当执行一个 multi-match 查询时,系统会在后台为指定的每一个字段执行匹配操作,然后将所有结果进行智能合并和排序,最终返回一个统一的相关度排序列表。

功能示例

数据准备:

测试数据表:

表名: dbpedia_entities_1m
创建时间: 2025-07-03 12:07:32
数据量: 1,000,000 行
存储大小: 12.6 GB

表结构:

  • id (string) - 实体ID
  • title (string) - 实体标题
  • text (string) - 实体描述文本
  • vec (vector(float,1536)) - 1536维向量

已构建索引如下:

索引名称索引类型目标字段分析器特殊配置
inverted_multi_match_idx_idINVERTEDidunicode-
inverted_multi_match_idx_titleINVERTEDtitleunicode-
inverted_multi_match_idx_textINVERTEDtextunicode-
idx_dbpedia_vec_1536VECTORvec-ef.construction=128, m=64

功能示例:

单列匹配

title 字段搜索,要求 'Paris Wisconsin Foster' 匹配度超67%

SELECT    
    id,
    title
FROM dbpedia_entities_1m
WHERE multi_match (
            title,
            'Paris Wisconsin Foster',
            str_to_map('analyzer:unicode,minimum_should_match:67%')
      );

多字段联合搜索

三字段联合搜索 (ID + Title + Text)

id, title, text 三个字段中搜索,要求查询词中至少有3个匹配

SELECT    id,
          title,
          text
FROM      dbpedia_entities_1m
WHERE     multi_match (
            id,
            title,
            text,
            'French deaf Avenue_Q Robert driver',
            str_to_map('analyzer:unicode,minimum_should_match:3')
          );

结果:返回2条相关记录,内容语义匹配准确

id	                   title	    text
<dbpedia:Robert_Manzon>	 | Robert Manzon  | Robert Manzon (12 April 1917 – 19 January 2015) was a French racing driver. He participated in 29 Formula One World Championship Grands Prix, debuting on 21 May 1950. He achieved two podiums, and scored a total of 16 championship points. At the time of his death, Manzon was the last surviving driver to have taken part in the first Formula One World Championship in 1950.
<dbpedia:Robert_Benoist> | Robert Benoist | Robert Marcel Charles Benoist (20 March 1895 – 9 September 1944) was a French Grand Prix motor racing driver and war hero.

联系我们
预约咨询
微信咨询
电话咨询