当前位置: 代码网 > it编程>数据库>Mysql > MySQL窗口函数 OVER()全解析

MySQL窗口函数 OVER()全解析

2025年12月31日 Mysql 我要评论
一、窗口函数概述1. 什么是窗口函数?窗口函数(window function)是对一组行(称为"窗口")执行计算,并为每一行返回一个值的函数。与聚合函数不同,窗口函数不减少行数。

一、窗口函数概述

1. 什么是窗口函数?

窗口函数window function)是对一组行(称为"窗口")执行计算,并为每一行返回一个值的函数。与聚合函数不同,窗口函数不减少行数。

2. 窗口函数 vs 聚合函数

特性窗口函数聚合函数
返回行数与输入行数相同通常减少行数(group by)
分组效果保留所有行,添加计算结果每组返回一行
语法位置select 子句中select 或 having 子句中
典型函数row_number(), rank(), sum() over()sum(), count(), avg()

3. 基本语法结构

窗口函数([参数]) over (
  [partition by <分组列>] 
  [order by <排序列 asc/desc>]
  [rows between 开始行 and 结束行]
)
  • over() 里面不能直接放 group by!可以放partition by
  • partition by 子句用于指定分组列,关键字:partition by
  • order by 子句用于指定排序列,关键字order by
  • rows between 子句用于指定窗口的范围,关键字rows between 即[开始行]、[结束行]

其中,rows between 子句在实际中可能用得相对少一些,因此有部分参考资料的语法描述省略了rows between 子句,主要侧重于partition by分组与order by排序:

二、窗口函数核心组成部分

1. partition by - 分区子句

将数据划分为多个分区,在每个分区内独立计算。

-- 创建测试数据
create table sales (
    id int primary key auto_increment,
    salesperson varchar(50),
    region varchar(50),
    sale_date date,
    amount decimal(10, 2)
);
insert into sales (salesperson, region, sale_date, amount) values
('张三', '北京', '2024-01-01', 1000.00),
('张三', '北京', '2024-01-02', 1500.00),
('李四', '上海', '2024-01-01', 2000.00),
('李四', '上海', '2024-01-02', 2500.00),
('王五', '北京', '2024-01-01', 1200.00),
('王五', '北京', '2024-01-03', 1800.00),
('赵六', '广州', '2024-01-02', 2200.00);
-- 按销售员分区计算
select 
    salesperson,
    sale_date,
    amount,
    -- 每个销售员的销售总额
    sum(amount) over (partition by salesperson) as total_by_person,
    -- 每个地区的销售总额
    sum(amount) over (partition by region) as total_by_region,
    -- 不分区(全局总额)
    sum(amount) over () as grand_total
from sales
order by salesperson, sale_date;

输出结果:

salesperson | sale_date  | amount | total_by_person | total_by_region | grand_total
------------|------------|--------|-----------------|-----------------|------------
张三        | 2024-01-01 | 1000.00| 2500.00         | 5500.00         | 12200.00
张三        | 2024-01-02 | 1500.00| 2500.00         | 5500.00         | 12200.00
李四        | 2024-01-01 | 2000.00| 4500.00         | 4500.00         | 12200.00
李四        | 2024-01-02 | 2500.00| 4500.00         | 4500.00         | 12200.00
王五        | 2024-01-01 | 1200.00| 3000.00         | 5500.00         | 12200.00
王五        | 2024-01-03 | 1800.00| 3000.00         | 5500.00         | 12200.00
赵六        | 2024-01-02 | 2200.00| 2200.00         | 2200.00         | 12200.00

2. order by - 排序子句

在分区内对行进行排序,影响排名函数和累计计算

select 
    salesperson,
    sale_date,
    amount,
    -- 按金额排序(分区内)
    row_number() over (partition by salesperson order by amount desc) as rn,
    -- 累计金额(分区内按日期排序)
    sum(amount) over (partition by salesperson order by sale_date) as running_total,
    -- 移动平均值(最近3行的平均值)
    avg(amount) over (partition by salesperson order by sale_date 
         rows between 2 preceding and current row) as moving_avg_3
from sales
order by salesperson, sale_date;

3.聚合窗口函数

许多窗口函数的教程,通常将常用的窗口函数分为两大类:聚合窗口函数专用窗口函数。聚合窗口函数的函数名与普通常用聚合函数一致,功能也一致。从使用的角度来讲,与普通聚合函数的区别在于提供了窗口函数的专属子句,来使得数据的分析与获取更简便。主要有如下几个:

函数名作用
sum对指定列的数值求和
avg计算指定列的平均值
count统计记录/非空值数量
max找出指定列的最大值
min找出指定列的最小值

4.专用窗口函数

常见的专用窗口函数

函数名分类说明
rank排序函数类似于排名,并列的结果序号可以重复,序号不连续(如:1,2,2,4)
dense_rank排序函数类似于排名,并列的结果序号可以重复,序号连续(如:1,2,2,3)
row_number排序函数对分组下的所有结果排序,基于分组分配唯一连续的行号(如:1,2,3,4)
percent_rank分布函数每行按公式 (rank-1) / (rows-1) 计算,结果为0~1的百分比值
cume_dist分布函数分组内小于等于当前rank值的行数 ÷ 分组内总行数,结果为0~1的百分比值

5. rows between - 窗口帧子句

定义窗口函数的计算范围。

-- 各种窗口帧的示例
select 
    salesperson,
    sale_date,
    amount,
    -- 默认:分区内所有行
    sum(amount) over (partition by salesperson) as total_all,
    -- rows模式:物理行
    sum(amount) over (
        partition by salesperson 
        order by sale_date
        rows between unbounded preceding and current row
    ) as cumulative_rows,
    -- range模式:逻辑值范围(相同值的行视为同一帧)
    sum(amount) over (
        partition by salesperson 
        order by sale_date
        range between unbounded preceding and current row
    ) as cumulative_range,
    -- 滑动窗口:当前行及前2行
    sum(amount) over (
        partition by salesperson 
        order by sale_date
        rows between 2 preceding and current row
    ) as sum_last_3,
    -- 前后各一行
    sum(amount) over (
        partition by salesperson 
        order by sale_date
        rows between 1 preceding and 1 following
    ) as sum_neighbors
from sales
order by salesperson, sale_date;

三、窗口函数分类详解

1. 序号函数(ranking functions)

-- 创建测试数据
create table employees (
    id int primary key auto_increment,
    name varchar(50),
    department varchar(50),
    salary decimal(10, 2)
);
insert into employees (name, department, salary) values
('张三', '技术部', 8000.00),
('李四', '技术部', 9000.00),
('王五', '技术部', 9500.00),
('赵六', '技术部', 9000.00),
('钱七', '销售部', 7000.00),
('孙八', '销售部', 8500.00),
('周九', '销售部', 8500.00),
('吴十', '销售部', 7500.00);
-- 1. row_number():连续不重复的序号
select 
    name,
    department,
    salary,
    row_number() over (partition by department order by salary desc) as row_num
from employees;
-- 2. rank():有间隔的排名(相同值排名相同,下一个排名跳跃)
select 
    name,
    department,
    salary,
    rank() over (partition by department order by salary desc) as rank_num
from employees;
-- 3. dense_rank():无间隔的排名(相同值排名相同,下一个排名连续)
select 
    name,
    department,
    salary,
    dense_rank() over (partition by department order by salary desc) as dense_rank_num
from employees;
-- 4. ntile(n):将数据分为n组
select 
    name,
    department,
    salary,
    ntile(4) over (partition by department order by salary desc) as quartile
from employees;

输出对比:

部门   | 姓名 | 薪资   | row_number | rank | dense_rank | ntile(4)
------|------|--------|------------|------|------------|---------
技术部 | 王五 | 9500   | 1          | 1    | 1          | 1
技术部 | 李四 | 9000   | 2          | 2    | 2          | 1
技术部 | 赵六 | 9000   | 3          | 2    | 2          | 2
技术部 | 张三 | 8000   | 4          | 4    | 3          | 2
销售部 | 孙八 | 8500   | 1          | 1    | 1          | 1
销售部 | 周九 | 8500   | 2          | 1    | 1          | 1
销售部 | 吴十 | 7500   | 3          | 3    | 2          | 2
销售部 | 钱七 | 7000   | 4          | 4    | 3          | 2

2. 分布函数(distribution functions)

-- 5. percent_rank():百分比排名 (rank - 1) / (total_rows - 1)
select 
    name,
    department,
    salary,
    rank() over (partition by department order by salary) as rank_num,
    percent_rank() over (partition by department order by salary) as percent_rank
from employees;
-- 6. cume_dist():累计分布(小于等于当前值的行数 / 总行数)
select 
    name,
    department,
    salary,
    cume_dist() over (partition by department order by salary) as cume_dist
from employees;
-- 7. percentile_cont():连续百分位数(需要mysql 8.0.2+)
select 
    department,
    percentile_cont(0.5) within group (order by salary) 
        over (partition by department) as median_salary
from employees
group by department, salary;
-- 8. percentile_disc():离散百分位数
select 
    department,
    percentile_disc(0.5) within group (order by salary) 
        over (partition by department) as median_salary
from employees
group by department, salary;

3. 前后值函数(value functions)

-- 9. lag(column, n, default):获取前n行的值
select 
    name,
    department,
    salary,
    lag(salary, 1, 0) over (partition by department order by salary) as prev_salary,
    salary - lag(salary, 1, 0) over (partition by department order by salary) as salary_diff
from employees;
-- 10. lead(column, n, default):获取后n行的值
select 
    name,
    department,
    sale_date,
    amount,
    lead(amount, 1, 0) over (partition by salesperson order by sale_date) as next_amount,
    lead(sale_date, 1, null) over (partition by salesperson order by sale_date) as next_date
from sales;
-- 11. first_value(column):窗口内第一个值
select 
    name,
    department,
    salary,
    first_value(salary) over (
        partition by department 
        order by salary 
        rows between unbounded preceding and unbounded following
    ) as lowest_salary,
    salary - first_value(salary) over (
        partition by department 
        order by salary 
        rows between unbounded preceding and unbounded following
    ) as diff_from_lowest
from employees;
-- 12. last_value(column):窗口内最后一个值(注意默认窗口帧!)
select 
    name,
    department,
    salary,
    -- 错误用法:默认窗口帧是 range between unbounded preceding and current row
    last_value(salary) over (partition by department order by salary) as wrong_last_value,
    -- 正确用法:指定完整的窗口帧
    last_value(salary) over (
        partition by department 
        order by salary 
        rows between unbounded preceding and unbounded following
    ) as correct_last_value,
    -- 或者使用nth_value
    nth_value(salary, 1) over (
        partition by department 
        order by salary 
        rows between unbounded preceding and unbounded following
    ) as first_salary,
    nth_value(salary, 2) over (
        partition by department 
        order by salary 
        rows between unbounded preceding and unbounded following
    ) as second_salary
from employees;

4. 聚合函数作为窗口函数

-- 所有聚合函数都可以作为窗口函数使用
select 
    salesperson,
    region,
    sale_date,
    amount,
    -- 聚合函数
    count(*) over (partition by salesperson) as total_transactions,
    sum(amount) over (partition by salesperson) as total_amount,
    avg(amount) over (partition by salesperson) as avg_amount,
    max(amount) over (partition by salesperson) as max_amount,
    min(amount) over (partition by salesperson) as min_amount,
    -- 标准差和方差(mysql 8.0+)
    stddev(amount) over (partition by salesperson) as std_amount,
    variance(amount) over (partition by salesperson) as var_amount,
    -- 累计聚合
    sum(amount) over (
        partition by salesperson 
        order by sale_date
        rows between unbounded preceding and current row
    ) as running_total,
    -- 移动平均
    avg(amount) over (
        partition by salesperson 
        order by sale_date
        rows between 2 preceding and current row
    ) as moving_avg_3,
    -- 百分比
    amount * 100.0 / sum(amount) over (partition by salesperson) as percentage
from sales
order by salesperson, sale_date;

到此这篇关于mysql窗口函数 over()讲解的文章就介绍到这了,更多相关mysql 窗口函数over()内容请搜索代码网以前的文章或继续浏览下面的相关文章希望大家以后多多支持代码网!

(0)

相关文章:

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论

验证码:
Copyright © 2017-2025  代码网 保留所有权利. 粤ICP备2024248653号
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com