MySQL对前N条数据求和的几种方案_Mysql

在数据分析场景中，我们经常需要计算分组数据中排名前n的记录的合计值。本文将详细介绍在mysql中实现这一需求的几种方法，并对比它们的性能差异。

一、基础需求场景

假设我们有一个销售数据表sales_data，结构如下：

create table sales_data (
    id int auto_increment primary key,
    product_name varchar(100),
    category varchar(50),
    sales_amount decimal(12,2),
    sale_date date
);

需求：计算每个产品类别中销售额前5名的合计销售额

二、传统解决方案（union all）

最常见的实现方式是使用union all组合两个查询：

-- 查询前5名明细
select 
    category,
    product_name,
    sales_amount
from sales_data
where (category, sales_amount) in (
    select category, sales_amount
    from sales_data
    where sale_date between '2023-01-01' and '2023-12-31'
    order by category, sales_amount desc
    limit 5
)

union all

-- 查询前5名合计
select 
    category,
    'top5_total' as product_name,
    sum(sales_amount) as sales_amount
from sales_data
where (category, sales_amount) in (
    select category, sales_amount
    from sales_data
    where sale_date between '2023-01-01' and '2023-12-31'
    order by category, sales_amount desc
    limit 5
)
group by category
order by category, sales_amount desc;

问题分析：

重复扫描表数据两次
子查询执行效率低
当数据量大时性能急剧下降

三、优化方案1：窗口函数+条件聚合（mysql 8.0+）

mysql 8.0及以上版本支持窗口函数，可以更高效地实现：

with ranked_sales as (
    select 
        category,
        product_name,
        sales_amount,
        row_number() over (partition by category order by sales_amount desc) as rn
    from sales_data
    where sale_date between '2023-01-01' and '2023-12-31'
)

select 
    category,
    product_name,
    sales_amount,
    case when product_name = 'top5_total' then null else rn end as rank_position
from (
    -- 前5名明细
    select 
        category,
        product_name,
        sales_amount,
        rn
    from ranked_sales
    where rn <= 5
    
    union all
    
    -- 前5名合计
    select 
        category,
        'top5_total' as product_name,
        sum(sales_amount) as sales_amount,
        null as rn
    from ranked_sales
    where rn <= 5
    group by category
) combined
order by category, ifnull(rn, 9999), sales_amount desc;

优势：

只需扫描表一次
利用窗口函数高效排序
结果集排序更灵活

四、优化方案2：用户变量模拟（mysql 5.7及以下）

对于不支持窗口函数的旧版本，可以使用用户变量模拟：

select 
    final_data.*
from (
    -- 前5名明细
    select 
        category,
        product_name,
        sales_amount,
        @rn := if(@current_category = category, @rn + 1, 1) as rn,
        @current_category := category as dummy
    from 
        sales_data,
        (select @rn := 0, @current_category := '') as vars
    where 
        sale_date between '2023-01-01' and '2023-12-31'
    order by 
        category, sales_amount desc
    
    union all
    
    -- 前5名合计
    select 
        t.category,
        'top5_total' as product_name,
        sum(t.sales_amount) as sales_amount,
        null as rn,
        null as dummy
    from (
        select 
            category,
            product_name,
            sales_amount,
            @rn2 := if(@current_category2 = category, @rn2 + 1, 1) as rn2,
            @current_category2 := category as dummy2
        from 
            sales_data,
            (select @rn2 := 0, @current_category2 := '') as vars2
        where 
            sale_date between '2023-01-01' and '2023-12-31'
        order by 
            category, sales_amount desc
    ) t
    where t.rn2 <= 5
    group by t.category
) final_data
where 
    (product_name != 'top5_total' and rn <= 5)
    or 
    (product_name = 'top5_total')
order by 
    category, ifnull(rn, 9999), sales_amount desc;

注意：

用户变量在复杂查询中可能不稳定
需要确保变量初始化正确
建议在测试环境验证结果

五、最佳实践方案（推荐）

结合性能与可维护性，推荐以下实现方式：

-- 创建临时表存储排名数据
create temporary table temp_ranked_sales as
select 
    category,
    product_name,
    sales_amount,
    row_number() over (partition by category order by sales_amount desc) as rn
from sales_data
where sale_date between '2023-01-01' and '2023-12-31';

-- 创建索引加速查询
create index idx_temp_rank on temp_ranked_sales(category, rn);

-- 最终查询
(
    -- 前5名明细
    select 
        category,
        product_name,
        sales_amount,
        rn as rank_position
    from temp_ranked_sales
    where rn <= 5
)
union all
(
    -- 前5名合计
    select 
        category,
        'top5_total' as product_name,
        sum(sales_amount) as sales_amount,
        null as rank_position
    from temp_ranked_sales
    where rn <= 5
    group by category
)
order by category, ifnull(rank_position, 9999), sales_amount desc;

-- 清理临时表
drop temporary table temp_ranked_sales;

性能优化点：