浅谈MySQL中字符串匹配的N种姿势_Mysql

前言

在mysql数据库开发中，字符串匹配是高频需求。无论是用户搜索、数据清洗还是业务逻辑过滤，掌握高效的字符串匹配方法都至关重要。本文将结合实际场景，详细解析mysql中字符串匹配的多种实现方式，涵盖基础语法、正则表达式、全文索引等核心知识点，帮助开发者快速定位最优解决方案。

一、模糊匹配：like与通配符

like 基本语法

like 是 mysql 中最常用的字符串匹配操作符，配合通配符实现模式匹配，语法如下：

select column_name(s)
from table_name
where column_name like pattern;

通配符详解

% 通配符（匹配任意字符序列）

查询姓 “张” 的用户(以"张"为前缀)
select * from users where username like '张%';

匹配包含 “test” 的字符串（不区分位置）
select * from logs where message like '%test%';

_ 通配符（匹配单个任意字符）

查询用户名恰好为 3 位，且以 “a” 开头的记录
select * from users where username like 'a__';

注意事项性能：like ‘%xxx%’ 无法利用索引，建议优化为前缀匹配（如 ‘xxx%’）；
大小写敏感：默认不区分大小写，可通过 binary 关键字开启敏感匹配：

select * from users where username like binary 'ab%'; -- 区分大小写

二、正则表达式匹配：regexp

1.基础语法

mysql 通过 regexp 操作符支持正则表达式匹配，语法如下：

select column_name(s)
from table_name
where column_name regexp pattern;

2.常用正则表达式模式

字符匹配

^：匹配字符串开头

-- 匹配以数字开头的字符串
select * from data where value regexp '^[0-9]';

$：匹配字符串结尾

-- 匹配以“.com”结尾的域名
select * from urls where domain regexp '\.com$';

重复匹配

*：匹配前一个字符 0 次或多次

-- 匹配包含连续多个“a”的字符串
select * from texts where content regexp 'a*';

+：匹配前一个字符 1 次或多次

-- 匹配至少包含一个“a”的字符串
select * from texts where content regexp 'a+';

分组与或操作

|：逻辑或

-- 匹配“male”或“female”
select * from users where gender regexp 'male|female';

()：分组

-- 匹配手机号（支持13/15/18开头）
select * from contacts where phone regexp '^1(3|5|8)[0-9]{9}$';

性能建议复杂正则表达式可能导致全表扫描，建议对匹配字段添加索引；
避免在表达式起始位置使用 ^ 以外的锚定符（如 $），以提升匹配效率。

三、全文索引匹配：fulltext

适用场景

当需要处理大量文本数据（如文章内容、日志信息）的模糊匹配时，fulltext 索引是更优选择，相比 like 和 regexp 具有更高的查询效率。

创建全文索引

-- 创建表时添加全文索引
create table articles (
    id int auto_increment primary key,
    title varchar(255),
    content text,
    fulltext(title, content)
);

-- 对现有表添加全文索引
alter table articles add fulltext(title, content);

使用 match against 查询

-- 简单匹配（查询包含“mysql”的文章）
select * from articles 
where match(title, content) against('mysql');

-- 布尔模式匹配（+必须包含，-排除）
select * from articles 
where match(title, content) against('+优化 -索引' in boolean mode);

-- 自然语言模式（返回相关度排序结果）
select *, match(title, content) against('性能调优') as score
from articles 
order by score desc;

四、高级函数辅助匹配

soundex 函数（语音匹配）

用于匹配发音相似的字符串，适用于人名、地名的模糊搜索：

-- 查询发音与“smith”相似的用户
select * from users 
where soundex(username) = soundex('smith');

elt 函数（枚举匹配）

根据索引值返回枚举值，常用于固定列表的快速匹配：

-- 根据状态码返回对应描述
select id, elt(status, '未开始', '进行中', '已完成') as status_desc from tasks;

五、性能优化总结

匹配方式	适用场景	索引支持	性能等级
like + 前缀匹配	简单模糊查询（如用户名前缀）	支持索引	★★★★☆
regexp	复杂模式匹配（如正则校验）	部分支持	★★☆☆☆
fulltext	全文搜索（如文章内容）	全文索引支持	★★★★★
精确匹配	主键 / 唯一键查询	索引全支持	★★★★★