前言
窗口函数(window function)是mysql 8.0引入的重要功能,允许在查询结果的特定"窗口"(数据子集)上执行计算,而不改变结果集的行数。与聚合函数不同,窗口函数不会将多行合并为一行。mysql的with窗口函数(也称为公共表表达式cte + 窗口函数)
1. 基本语法结构
with cte_name as (
select
column1,
column2,
window_function() over (partition by ... order by ...) as window_column
from table
)
select * from cte_name;
- cte_name:结果集的名称,类似表名,可以当做是一张表,不过这个结果集不存在索引
- over:定义窗口的范围
- window_function():窗口函数,row_number(),rank()等
- partition by:将数据分成多个独立的分区,类似group by子句,但是在窗口函数中,数据不会合并为一行
- order by:order by和普通查询语句中的order by没什么不同
2.常用窗口函数分类
2.1 排名函数
| 函数 | 说明 | 示例 |
|---|---|---|
row_number() | 连续编号(无重复) | 1,2,3,4,5 |
rank() | 排名(允许并列) | 1,2,2,4,5 |
dense_rank() | 密集排名 | 1,2,2,3,4 |
2.2 聚合函数
| 函数 | 说明 |
|---|---|
sum() over() | 窗口内求和 |
avg() over() | 窗口内平均值 |
count() over() | 窗口内计数 |
max() over() | 窗口内最大值 |
min() over() | 窗口内最小值 |
2.3 分布函数
| 函数 | 说明 |
|---|---|
ntile(n) | 分成n组 |
percent_rank() | 百分比排名 |
cume_dist() | 累积分布 |
3.实际应用示例
示例数据准备
-- 医疗影像检查记录表
create table medical_exams (
exam_id int primary key,
patient_id int comment '患者id',
exam_date date comment '检查日期',
exam_type varchar(50) comment '检查类型',
cost decimal(10,2) comment '费用',
hospital_id int comment '医院id'
);
insert into medical_exams values
(1, 101, '2024-01-10', 'ct', 500.00, 1),
(2, 101, '2024-01-15', 'mri', 800.00, 1),
(3, 102, '2024-01-12', 'x-ray', 200.00, 1),
(4, 103, '2024-01-18', 'ct', 500.00, 2),
(5, 101, '2024-01-20', 'ultrasound', 300.00, 1),
(6, 104, '2024-01-22', 'mri', 800.00, 2);
4.详细用法示例
4.1 基础排名查询
-- 每个患者的检查记录按时间排序,row_number()生成连续的行号
with patient_exams as (
select
patient_id,
exam_date,
exam_type,
cost,
row_number() over (partition by patient_id order by exam_date) as exam_sequence
from medical_exams
)
select * from patient_exams;
运行结果

4.2 累计统计
-- 计算每个患者的当前累计检查费用,平均费用
with patient_costs as (
select
patient_id,
exam_date,
exam_type,
cost,
sum(cost) over (partition by patient_id order by exam_date) as cumulative_cost,
avg(cost) over (partition by patient_id) as avg_cost_per_exam
from medical_exams
)
select * from patient_costs;
运行结果

说明:id为101的第一次500,第二次累计500+800=1300,第三次500+800+300=1600
-- 获取每个患者的最近一次检查
with patient_exams as (
select *,
row_number() over (partition by patient_id order by exam_date desc) as rn
from medical_exams
)
select * from patient_exams where rn = 1;
-- 等同于以前用的group by语句
select *
from medical_exams
where (patient_id, exam_date) in (
select patient_id, max(exam_date)
from medical_exams
group by patient_id
)
order by patient_id;
运行结果

4.3 获取前三的排名
-- 每个医院内检查费用排名
with hospital_ranking as (
select
exam_id,
patient_id,
hospital_id,
exam_type,
cost,
rank() over (partition by hospital_id order by cost desc) as cost_rank,-- 排名
row_number() over (partition by hospital_id order by cost desc) as row_num -- 连续编号
from medical_exams
)
select * from hospital_ranking
where cost_rank <= 3; -- 每个医院费用前三的检查
运行结果

使用窗口函数 rank() over (partition by hospital_id order by cost desc) 按医院分组,按检查费用降序排名
5.高级窗口函数用法
5.1 使用窗口框架
-- 计算移动平均(最近3次检查)
with moving_avg as (
select
patient_id,
exam_date,
cost,
avg(cost) over (
partition by patient_id -- 按患者分组
order by exam_date -- 按检查日期排序
rows between 2 preceding and current row -- 窗口范围:当前行+前2行
) as avg_last_3_exams
from medical_exams
)
select * from moving_avg;
- rows:按物理行数计算(不是按值)
- 2 preceding:当前行之前的2行
- current row:当前行
- 合计:当前行 + 前2行 = 3行
这里的窗口可以加额外的条件,比如只计算最近一年的数据
...from medical_exams
where exam_date >= date_sub(curdate(), interval 1 year)
计算过程
-- 第1行:只有当前行 窗口范围:第1行 计算:(500) / 1 = 500.00 -- 第2行:前1行 + 当前行 窗口范围:第1-2行 计算:(500 + 800) / 2 = 650.00 -- 第3行:前2行 + 当前行 窗口范围:第1-3行 计算:(500 + 800 + 300) / 3 = 533.33 -- 第4行:前2行 + 当前行(第2-4行) 窗口范围:第2-4行 计算:(800 + 300 + 600) / 3 = 566.67 -- 第5行:前2行 + 当前行(第3-5行) 窗口范围:第3-5行 计算:(300 + 600 + 400) / 3 = 433.33
运行结果
patient_id | exam_date | cost | avg_last_3_exams 101 | 2024-01-10 | 500.00 | 500.00 101 | 2024-01-15 | 800.00 | 650.00 101 | 2024-01-20 | 300.00 | 533.33 101 | 2024-01-25 | 600.00 | 566.67 101 | 2024-01-30 | 400.00 | 433.33
窗口框架的其他写法
-- 写法1:明确指定 rows between 2 preceding and current row -- 写法2:简写(mysql 8.0+) rows 2 preceding -- 写法3:向后扩展 rows between current row and 2 following -- 当前行+后2行 -- 写法4:前后扩展 rows between 1 preceding and 1 following -- 前1行+当前行+后1行 -- 写法5:无界窗口 rows between unbounded preceding and current row -- 从开始到当前行
rows vs range 的区别
| 特性 | rows(物理行) | range(逻辑值) |
|---|---|---|
| 计算方式 | 按行数计算 | 按值范围计算 |
| 适用场景 | 固定行数移动平均 | 按时间范围统计 |
| 示例 | 最近3行 | 最近30天 |
range示例:
-- 计算最近30天内的平均费用
avg(cost) over (
partition by patient_id
order by exam_date
range between interval 30 day preceding and current row
) as avg_last_30_days
5.2 前后值比较
-- 与上一次检查比较
with exam_comparison as (
select
patient_id,
exam_date,
exam_type,
cost,
lag(cost) over (partition by patient_id order by exam_date) as prev_exam_cost,
cost - lag(cost) over (partition by patient_id order by exam_date) as cost_change,
lead(exam_date) over (partition by patient_id order by exam_date) as next_exam_date
from medical_exams
)
select * from exam_comparison;
函数功能说明
| 函数 | 作用 | 示例 |
|---|---|---|
lag() | 获取前一行的值 | 上次检查的费用 |
lead() | 获取后一行的值 | 下次检查的日期 |
cost - lag(cost) | 计算变化量 | 费用增减金额 |
计算过程
-- 患者101的记录处理:
第1行:lag(cost) = null(没有前一行)
cost_change = 500 - null = null
lead(exam_date) = '2024-01-15'(下一行日期)
第2行:lag(cost) = 500.00(前一行费用)
cost_change = 800 - 500 = 300.00(增加300)
lead(exam_date) = '2024-01-20'
第3行:lag(cost) = 800.00
cost_change = 300 - 800 = -500.00(减少500)
lead(exam_date) = null(没有下一行)
-- 患者102的记录处理(重新开始):
第4行:lag(cost) = null(新患者,没有前一行)
cost_change = null
lead(exam_date) = null
运行结果

lag()和lead()的完整语法
lag(column, offset, default_value) over (...) lead(column, offset, default_value) over (...)
| 参数 | 说明 | 示例 |
|---|---|---|
column | 要获取的列 | cost, exam_date |
offset | 偏移量(默认1) | lag(cost, 2)获取前2行的值 |
default_value | 默认值(替代null) | lag(cost, 1, 0)无前一行时返回0 |
高级用法示例:
-- 获取前2次检查的费用 lag(cost, 2, 0) over (partition by patient_id order by exam_date) as cost_2_exams_ago, -- 获取后一次检查的类型 lead(exam_type, 1, '未知') over (partition by patient_id order by exam_date) as next_exam_type, -- 计算与上上次检查的变化 cost - lag(cost, 2, cost) over (...) as change_from_2_exams_ago
5.3 百分比计算
-- 计算每项检查费用在总费用中的占比
with cost_analysis as (
select
exam_id,
patient_id,
exam_type,
cost,
sum(cost) over (partition by patient_id) as total_patient_cost,
cost / sum(cost) over (partition by patient_id) * 100 as cost_percentage,
percent_rank() over (partition by patient_id order by cost) as cost_percent_rank
from medical_exams
)
select * from cost_analysis;
percent_rank()的计算公式:(当前行的排名 - 1) / (总行数 - 1)
6.多级窗口函数
6.1 复杂分析查询
-- 多层分析:患者+医院级别统计
with multi_level_analysis as (
select
exam_id,
patient_id,
hospital_id,
exam_type,
cost,
-- 患者级别统计
sum(cost) over (partition by patient_id) as patient_total,
rank() over (partition by patient_id order by exam_date) as patient_exam_seq,
-- 医院级别统计
avg(cost) over (partition by hospital_id) as hospital_avg_cost,
count(*) over (partition by hospital_id) as exams_per_hospital,
-- 全局统计
sum(cost) over () as grand_total,
rank() over (order by cost desc) as global_cost_rank
from medical_exams
)
select
exam_id,
patient_id,
hospital_id,
exam_type,
cost,
round(cost / patient_total * 100, 2) as patient_cost_percentage,
round(cost / hospital_avg_cost, 2) as cost_vs_hospital_avg
from multi_level_analysis;
也可以单独拆开多个cte
with cte1 as (select ...), cte2 as (select ...), cte3 as (select ...) select ... from cte1 join cte2 ...;
多cte链式查询
with
department_stats as (
select department_id, count(*) as emp_count, avg(salary) as avg_salary
from employees group by department_id
),
salary_analysis as (
select
department_id,
emp_count,
avg_salary,
rank() over (order by avg_salary desc) as salary_rank
from department_stats
)
select * from salary_analysis where salary_rank <= 3;
7.实际应用场景
7.1 患者检查频率分析
-- 分析患者检查频率模式
with exam_patterns as (
select
patient_id,
exam_date,
exam_type,
-- 计算与上一次检查的时间间隔
datediff(exam_date, lag(exam_date) over (
partition by patient_id order by exam_date
)) as days_since_last_exam,
-- 检查频率排名
ntile(4) over (partition by patient_id order by exam_date) as frequency_quartile
from medical_exams
)
select
patient_id,
avg(days_since_last_exam) as avg_days_between_exams,
count(*) as total_exams
from exam_patterns
group by patient_id
having count(*) > 1;
7.2 医院业务量分析
-- 医院月度业务分析
with monthly_stats as (
select
hospital_id,
date_format(exam_date, '%y-%m') as exam_month,
count(*) as exam_count,
sum(cost) as monthly_revenue,
-- 月度排名
rank() over (partition by hospital_id order by sum(cost) desc) as revenue_rank,
-- 月度增长
lag(sum(cost)) over (partition by hospital_id order by date_format(exam_date, '%y-%m')) as prev_month_revenue
from medical_exams
group by hospital_id, date_format(exam_date, '%y-%m')
)
select
hospital_id,
exam_month,
exam_count,
monthly_revenue,
round(monthly_revenue / nullif(prev_month_revenue, 0) * 100, 2) as growth_rate
from monthly_stats;
8.性能优化技巧
8.1 使用适当的索引
-- 为窗口函数创建索引 create index idx_patient_date on medical_exams(patient_id, exam_date); create index idx_hospital_cost on medical_exams(hospital_id, cost desc);
8.2 分区数据限制
-- 限制分区数据量
with recent_exams as (
select * from medical_exams
where exam_date >= date_sub(curdate(), interval 30 day)
),
ranked_data as (
select
patient_id,
exam_date,
exam_type,
row_number() over (partition by patient_id order by exam_date desc) as rn
from recent_exams
)
select * from ranked_data where rn = 1; -- 最近一次检查
9.常见错误与解决方案
9.1 避免的陷阱
-- ❌ 错误:在where中使用窗口函数结果
select exam_id, row_number() over () as rn
from medical_exams
where rn = 1; -- 错误!rn在where时不可用
-- ✅ 正确:使用子查询或cte
with numbered_exams as (
select exam_id, row_number() over () as rn
from medical_exams
)
select exam_id from numbered_exams where rn = 1;
10.mysql 8.0+ 新特性
10.1 命名窗口
-- 定义可重用的窗口
select
patient_id,
exam_date,
cost,
sum(cost) over w as running_total,
avg(cost) over w as moving_avg
from medical_exams
window w as (partition by patient_id order by exam_date rows unbounded preceding);
10.2 json数据处理
1.1 json_table函数
select *
from json_table(
'[{"name": "john", "age": 30}, {"name": "jane", "age": 25}]',
'$[*]' columns (
name varchar(50) path '$.name',
age int path '$.age'
)
) as jt;
功能:将json数组转换为关系型表格
- 输入:json数组字符串
- 路径:
$[*]表示遍历数组所有元素 - 列映射:
name varchar(50) path '$.name':提取name字段age int path '$.age':提取age字段
输出结果:
name | age john | 30 jane | 25
1.2 json_extract和json_contains_path
select
exam_id,
json_extract(patient_info, '$.name') as patient_name,
json_extract(patient_info, '$.age') as patient_age,
json_unquote(json_extract(patient_info, '$.insurance')) as insurance_type
from medical_records
where json_contains_path(patient_info, 'one', '$.chronic_diseases');
函数说明:
json_extract(json_doc, path):提取json字段值(返回json格式)json_unquote():去除json字符串的引号json_contains_path(json_doc, 'one', path):检查是否存在指定路径
2.1 创建医疗记录表
create table medical_records (
record_id int primary key auto_increment,
exam_id varchar(20) not null,
patient_info json not null,
exam_data json not null,
created_at timestamp default current_timestamp
);
2.2 插入测试数据
insert into medical_records (exam_id, patient_info, exam_data) values
('exam001', '{
"name": "张三",
"age": 45,
"gender": "男",
"insurance": "城镇职工医保",
"chronic_diseases": ["高血压", "糖尿病"],
"contact": {
"phone": "13800138000",
"emergency_contact": "李四"
},
"medical_history": {
"allergies": ["青霉素"],
"surgeries": ["阑尾切除术-2015"]
}
}', '{
"exam_type": "ct",
"body_part": "胸部",
"results": {
"diagnosis": "肺部结节",
"size": "5mm",
"location": "右上肺",
"urgency": "常规随访"
},
"radiologist": "王医生",
"cost": 680.00
}'),
('exam002', '{
"name": "李四",
"age": 32,
"gender": "女",
"insurance": "新农合",
"chronic_diseases": [],
"contact": {
"phone": "13900139000",
"emergency_contact": "王五"
},
"medical_history": {
"allergies": [],
"surgeries": []
}
}', '{
"exam_type": "mri",
"body_part": "头部",
"results": {
"diagnosis": "正常",
"findings": "未见明显异常",
"urgency": "常规"
},
"radiologist": "赵医生",
"cost": 1200.00
}'),
('exam003', '{
"name": "王五",
"age": 68,
"gender": "男",
"insurance": "离退休干部医保",
"chronic_diseases": ["冠心病", "高血压", "糖尿病"],
"contact": {
"phone": "13700137000",
"emergency_contact": "赵六"
},
"medical_history": {
"allergies": ["阿司匹林"],
"surgeries": ["冠状动脉搭桥术-2020", "胆囊切除术-2018"]
}
}', '{
"exam_type": "超声",
"body_part": "腹部",
"results": {
"diagnosis": "胆囊息肉",
"size": "8mm",
"recommendation": "定期复查"
},
"radiologist": "孙医生",
"cost": 350.00
}');
3.1 基础信息提取
-- 提取患者基本信息
select
exam_id,
json_unquote(json_extract(patient_info, '$.name')) as patient_name,
json_extract(patient_info, '$.age') as patient_age,
json_unquote(json_extract(patient_info, '$.gender')) as gender,
json_unquote(json_extract(patient_info, '$.insurance')) as insurance_type,
json_unquote(json_extract(exam_data, '$.exam_type')) as exam_type,
json_extract(exam_data, '$.cost') as cost
from medical_records;
结果:
exam_id | patient_name | patient_age | gender | insurance_type | exam_type | cost exam001 | 张三 | 45 | 男 | 城镇职工医保 | ct | 680.00 exam002 | 李四 | 32 | 女 | 新农合 | mri | 1200.00 exam003 | 王五 | 68 | 男 | 离退休干部医保 | 超声 | 350.00
3.2 慢性病患者筛选
-- 查找有慢性病的患者
select
exam_id,
json_unquote(json_extract(patient_info, '$.name')) as patient_name,
json_extract(patient_info, '$.age') as age,
json_extract(patient_info, '$.chronic_diseases') as chronic_diseases
from medical_records
where json_contains_path(patient_info, 'one', '$.chronic_diseases')
and json_length(json_extract(patient_info, '$.chronic_diseases')) > 0;
结果:
exam_id | patient_name | age | chronic_diseases exam001 | 张三 | 45 | ["高血压", "糖尿病"] exam003 | 王五 | 68 | ["冠心病", "高血压", "糖尿病"]
3.3 复杂条件查询
-- 查找有特定过敏史的高龄患者
select
exam_id,
json_unquote(json_extract(patient_info, '$.name')) as name,
json_extract(patient_info, '$.age') as age,
json_extract(patient_info, '$.medical_history.allergies') as allergies
from medical_records
where json_extract(patient_info, '$.age') >= 60
and json_contains(json_extract(patient_info, '$.medical_history.allergies'), '"阿司匹林"');
3.4 使用json_table展开数组数据
-- 展开慢性病数组为多行
select
mr.exam_id,
json_unquote(json_extract(mr.patient_info, '$.name')) as patient_name,
diseases.disease_name
from medical_records mr,
json_table(
json_extract(mr.patient_info, '$.chronic_diseases'),
'$[*]' columns (
disease_name varchar(50) path '$'
)
) as diseases
where json_length(json_extract(mr.patient_info, '$.chronic_diseases')) > 0;
结果:
exam_id | patient_name | disease_name exam001 | 张三 | 高血压 exam001 | 张三 | 糖尿病 exam003 | 王五 | 冠心病 exam003 | 王五 | 高血压 exam003 | 王五 | 糖尿病
4.1 医疗费用分析
-- 按保险类型统计费用
select
json_unquote(json_extract(patient_info, '$.insurance')) as insurance_type,
count(*) as exam_count,
round(avg(json_extract(exam_data, '$.cost')), 2) as avg_cost,
round(sum(json_extract(exam_data, '$.cost')), 2) as total_cost
from medical_records
group by json_unquote(json_extract(patient_info, '$.insurance'))
order by total_cost desc;
4.2 检查结果严重程度分析
-- 分析检查结果的紧急程度
select
exam_id,
json_unquote(json_extract(patient_info, '$.name')) as patient_name,
json_unquote(json_extract(exam_data, '$.exam_type')) as exam_type,
json_unquote(json_extract(exam_data, '$.results.diagnosis')) as diagnosis,
json_unquote(json_extract(exam_data, '$.results.urgency')) as urgency_level,
case
when json_unquote(json_extract(exam_data, '$.results.urgency')) = '紧急' then '高危'
when json_unquote(json_extract(exam_data, '$.results.urgency')) = '常规随访' then '中危'
else '低危'
end as risk_level
from medical_records
order by
case
when urgency_level = '紧急' then 1
when urgency_level = '常规随访' then 2
else 3
end;
5.1 患者完整档案查询
select
exam_id,
-- 基本信息
json_unquote(json_extract(patient_info, '$.name')) as name,
json_extract(patient_info, '$.age') as age,
json_unquote(json_extract(patient_info, '$.gender')) as gender,
-- 联系信息
json_unquote(json_extract(patient_info, '$.contact.phone')) as phone,
json_unquote(json_extract(patient_info, '$.contact.emergency_contact')) as emergency_contact,
-- 医疗信息
json_extract(patient_info, '$.chronic_diseases') as chronic_diseases,
json_extract(patient_info, '$.medical_history.allergies') as allergies,
json_extract(patient_info, '$.medical_history.surgeries') as surgeries,
-- 检查信息
json_unquote(json_extract(exam_data, '$.exam_type')) as exam_type,
json_unquote(json_extract(exam_data, '$.body_part')) as body_part,
json_unquote(json_extract(exam_data, '$.results.diagnosis')) as diagnosis,
json_extract(exam_data, '$.cost') as cost,
-- 计算字段
case
when json_extract(patient_info, '$.age') >= 65 then '老年患者'
when json_extract(patient_info, '$.age') >= 45 then '中年患者'
else '青年患者'
end as age_group,
case
when json_length(json_extract(patient_info, '$.chronic_diseases')) >= 2 then '多病共存'
when json_length(json_extract(patient_info, '$.chronic_diseases')) = 1 then '单一慢性病'
else '无慢性病'
end as chronic_status
from medical_records;
6.1 创建函数索引
-- 为常用查询字段创建索引
alter table medical_records
add index idx_patient_name ((json_unquote(json_extract(patient_info, '$.name'))));
alter table medical_records
add index idx_patient_age ((json_extract(patient_info, '$.age')));
alter table medical_records
add index idx_exam_type ((json_unquote(json_extract(exam_data, '$.exam_type'))));
6.2 物化视图模式
-- 创建简化视图提高查询性能
create view patient_summary as
select
record_id,
exam_id,
json_unquote(json_extract(patient_info, '$.name')) as patient_name,
json_extract(patient_info, '$.age') as age,
json_unquote(json_extract(patient_info, '$.insurance')) as insurance,
json_unquote(json_extract(exam_data, '$.exam_type')) as exam_type,
json_extract(exam_data, '$.cost') as cost,
created_at
from medical_records;
总结
到此这篇关于mysql中with窗口函数说明及使用案例的文章就介绍到这了,更多相关mysql with窗口函数使用内容请搜索代码网以前的文章或继续浏览下面的相关文章希望大家以后多多支持代码网!
发表评论