MySQL数据库实现批量表分区完整示例_Mysql

对于单表大数据量大的问题，如果数据支持分片，使用表分区是个不错的选择，那么mysql是如何实现表分区的？

一、表分区条件

1.数据库存储引擎支持：innodb 和 myisam引擎

2.数据库版本支持：mysql 5.1以后（版本不同，具体的特性支持可能会有所不同）

3.数据必须有一个或多个分区键：作为分区的键（字段）必须是主键的一部分（联合主键）

4.分区定义：每个分区必须明确地定义数据范围

5.分区维护：随着时间推移，可能需要添加新的分区或删除旧的分区，以保持数据库的性能和结构

二、常规表和分区表的区别

常规表和分区表对比

	常规表	分区表
数据结构	所有数据存储在单一数据文件	数据被逻辑上分成多个部分，可能存放在多个文件甚至多个磁盘
查询优化	查询时默认扫描整表数据	只访问相关分区数据
i/o操作	添加、删除或修改行操作直接作用于整表	只对单个分区操作，不影响其他分区数据
备份恢复	通常备份整表数据	可以单独备份或恢复特定分区
存储管理	所有数据集中存储	数据分散到多个分区
扩展性	随着数据量增长，可能会遇到性能瓶颈	更容易水平扩展，可以通过增加新分区来处理更大的数据集，而不需要改变应用程序逻辑
限值和复杂性	相对简单，没有特殊的创建或维护要求	设计和实现更加复杂，需要考虑如何正确地设置分区策略以满足业务需求

从上面看分区表是否有很大的优势？但是同样分区表也存在一些限值：

分区表的限制

	常规表	分区表
外键约束	√	×
全文索引	√	×（5.6以前版本） ⍻（5.6以后版本）
临时表	√	×
列修改	√	×
特定的alter table语句	√	×（修改主键、唯一键等）
性能影响	数据量影响	添加、删除、合并表分区，可能会导致锁表从而影响性能
备份和恢复工具支持	通常工具都支持	不是所有的备份和恢复工具都完全支持分区表的所有特性
主备服务器数据复制	无特殊要求	必须保证分区规则一致性，任何不匹配都可能导致复制失败或数据不一致
分区类型限制	无	存储引擎可能限制分区类型
查询优化器的行为	简单索引	表分区+索引，特情情况的复杂查询可能会有表分区裁剪失效问题

三、表分区的创建

表分区创建关键的三个点：创建表、设置分区键、设置分片策略

示例：

create table sales (
    id int not null,
    sale_date date not null,
    amount decimal(10,2)
)
partition by range (year(sale_date)) (
    partition p0 values less than (2020),
    partition p1 values less than (2021),
    partition p2 values less than (2022),
    partition p3 values less than maxvalue
);

创建表名：sales

分区键：sale_date字段的year()结果，即sale_date字段的年份

分片策略：p0分区存储小于2020年数据、 p1分区存储小于2021年数据、p2分区存储小于2022年数据、p3分区存储其他年份数据（注意：这里的数据“挡板”很重要，设置时一定要小心）

注意：这里的分片策略是“less than xxx”，表示小于后面策略的数据数据，如上面就是小于指定年份的数据归属于这个分区，因此上面用“数据挡板”这个词

四、将既有表转换分区表脚本

因为表的创建结构不同，因此既有表不能直接转换为分区表，要实现既有表转换为分区表，需要经过以下几步：

1.根据既有表创建同字段结构的新分区表、定义好相关分区策略

2.迁移数据到分区表

3.删除旧表、并将分区表改名为原表

具体实现脚本如下：

create definer=`root`@`%` procedure `convert_table_to_partition`(in tbl_name varchar(200),out out_status int)
begin
    declare done int default false;
    -- 输出状态，开始执行状态 100，执行成功状态 200，执行失败状态 50
    set out_status = 100;
    
    -- 创建一个新的空表，不包含表分区（要转换为分区表，必须是空表）
    set @create_empty_tbl_sql = concat(
        'create table ', tbl_name, '_partitioned like ', tbl_name, ';'
    );
    prepare stmt from @create_empty_tbl_sql;
    execute stmt;
    deallocate prepare stmt;

    -- 获取所有唯一的 year_no 和 month_no 组合作为构建分区定义分区键
    set @partition_def = '';
    set @query = concat(
        'select group_concat(
            concat("partition p_", year_no, "_", lpad(month_no, 2, "0"), 
                   " values less than (", 
                   case when month_no = 12 then year_no + 1 else year_no end, ", ", 
                   case when month_no = 12 then 1 else month_no + 1 end, ")")
            order by year_no, month_no separator ",\n"
        ) into @partition_def
        from (
            select distinct year_no, month_no
            from ', tbl_name, '
            order by year_no, month_no
        ) as unique_years_months;'
    );
    prepare stmt from @query;
    execute stmt;
    deallocate prepare stmt;

    -- 调试信息：输出分区定义字符串
    --select tbl_name,@partition_def;

    -- 检查是否有有效的分区定义
    if @partition_def is null or @partition_def = '' then
        select tbl_name,'no data found for partitioning. skipping partition creation and data migration.';
        -- 空表则直接添加 p_max 分区用于捕获未来数据
        set @partition_def = '\npartition p_max values less than (maxvalue, maxvalue)';
    else
        -- 添加 p_max 分区用于捕获未来数据
        set @partition_def = concat(@partition_def, ',\npartition p_max values less than (maxvalue, maxvalue)');
    end if;

    -- 使用 alter table 添加分区定义
    set @add_partitions_sql = concat(
        'alter table ', tbl_name, '_partitioned 
        partition by range columns(year_no, month_no) (', @partition_def, ');'
    );

    -- 调试信息：输出添加分区的 sql 语句
    --select tbl_name,@add_partitions_sql;

    prepare stmt from @add_partitions_sql;
    execute stmt;
    deallocate prepare stmt;

    -- 迁移数据到新的分区表
    set @insert_into_partitioned_sql = concat(
        'insert into ', tbl_name, '_partitioned select * from ', tbl_name, ';'
    );

    -- 调试信息：输出插入数据的 sql 语句
    -- select tbl_name,@insert_into_partitioned_sql;

    prepare stmt from @insert_into_partitioned_sql;
    execute stmt;
    deallocate prepare stmt;

    -- 验证数据迁移是否成功
    set @count_original = concat('select count(*) into @count_original from ', tbl_name);
    prepare stmt from @count_original;
    execute stmt;
    deallocate prepare stmt;

    set @count_partitioned = concat('select count(*) into @count_partitioned from ', tbl_name, '_partitioned');
    prepare stmt from @count_partitioned;
    execute stmt;
    deallocate prepare stmt;

    -- 比较原表和新分区表的数据行数
    -- select tbl_name,@count_original, @count_partitioned;

    
    -- 如果数据迁移成功，删除旧表并重命名新表（无论是否有数据，均删除缓存表）
    if @count_original = @count_partitioned then
        -- 删除旧表
        set @drop_old_table_sql = concat('drop table if exists ', tbl_name);
        prepare stmt from @drop_old_table_sql;
        execute stmt;
        deallocate prepare stmt;

        -- 重命名新表为旧表名
        set @rename_tables_sql = concat('rename table ', tbl_name, '_partitioned to ', tbl_name);
        prepare stmt from @rename_tables_sql;
        execute stmt;
        deallocate prepare stmt;

        -- select tbl_name,'table conversion and data migration completed successfully.';
        set out_status = 200;
    else
        -- select tbl_name,'data migration failed, check the logs for more information.';
        set out_status = 50;
    end if;
    
end

上面脚本是一个完整的将既有表转换为以“year_no”和“month_no”字段为分区键的分区表，主要有以下几步操作：

1）以既有表为模板创建一个新的空表，不包含表分区（要转换为分区表，必须是空表）

2）获取所有唯一的 year_no 和 month_no 组合并构建分区定义字符串（对既有数据分析需要划分的分区策略）

3）检查是否有效的分区定义，若无分区定义，强烈建议则创建一个默认的分区策略p_max以存储未来的数据

4）更新空表，添加相关的分区策略

5）迁移历史数据到分区表

6）数据迁移校验（验证数据完整性）

7）删除旧表（回收表名）

8）将新分区表改名为原表名

五、批量转换表为分区表

批量将常规表转换为分区表，具体脚本如下：

create definer=`root`@`%` procedure `tables_convert_to_partition`()
begin
    declare done int default false;
    declare tbl_name varchar(64);
    declare convert_status int;
    declare cur cursor for
        select table_name
        from information_schema.tables
        where table_schema = database() and table_name like 'ai_result_%';
    declare continue handler for not found set done = true;

    open cur;

    read_loop: loop
        fetch cur into tbl_name;
        if done then
            leave read_loop;
        end if;
      
        -- 调试信息：输出正在转换的表
        select tbl_name,'covering...';

        call convert_table_to_partition(tbl_name,@status);
        
        set convert_status = @status;
        
        -- 根据返回的状态进行相应的处理
        case convert_status
            when 100 then
                -- 开始状态，可以忽略，因为这是预期的初始状态
                select tbl_name, 'started conversion.';
            when 200 then
                -- 成功完成
                select tbl_name, 'conversion and data migration completed successfully.';
            when 50 then
                -- 失败
                select tbl_name, 'data migration failed. check the logs for more information.';
            else
                -- 未知状态
                select tbl_name, concat('unknown status: ', status);
        end case;
    end loop;

    close cur;
end

这里是以“ai_result_”开头的表为例，将所有相关表转换为分区表，在执行这个存储过程时，操作用户必须要有information_schema数据库读取权限，这样才能查询出相关的表名从而进行转换。

该脚本建议为一次性执行脚本，避免对标频繁转换，防止锁表（因此表名前缀已固定在代码中，需根据自身需求修改）

六、表分区维护：添加表分区

表分区经过上面的过程创建，理论上已经对历史数据进行表分区，对未来数据也能存储到p_max分区，但是p_max分区数据如果不进行维护，同样会有数据量过大问题，因此我们需要定期切割p_max分区并增加相关表分区，这个操作需要在数据进入之前执行，具体执行脚本如下：

create definer=`root`@`%` procedure `add_monthly_partitions`(in tbl_name varchar(64), in year_no int, in month_no int,out out_status int)
begin
    declare done int default false;
    
    -- 输出状态，开始执行状态 100，执行成功状态 200，执行失败状态 50
    set out_status = 100;
    
    -- 检查待添加的分区是否已经存在
    set @partition_exists = exists (select 1 from information_schema.partitions 
                                    where table_schema = database() and table_name = tbl_name 
                                      and partition_name = concat('p_', year_no, '_', lpad(month_no, 2, '0')));
                                      
    if @partition_exists then
        -- 如果分区已存在，直接返回消息
        -- select concat('partition p_', year_no, '_', lpad(month_no, 2, '0'), ' already exists. no action taken.') as message;
        set out_status = 200;
    else
      -- 检查表中是否已经存在 p_max 分区
      set @has_p_max = exists (select 1 from information_schema.partitions 
                               where table_schema = database() and table_name = tbl_name 
                                 and partition_name = 'p_max');

      -- 构建添加分区的 sql 语句
      if @has_p_max then
          -- 如果存在 p_max 分区，则重新组织分区，将 p_max 分割成新分区和更新后的 p_max
          set @reorganize_partition_sql = concat(
              'alter table ', tbl_name, ' reorganize partition p_max into (
                  partition p_', year_no, '_', lpad(month_no, 2, '0'), 
                  ' values less than (', 
                  case when month_no = 12 then year_no + 1 else year_no end, ', ', 
                  case when month_no = 12 then 1 else month_no + 1 end, '),
                  partition p_max values less than (maxvalue, maxvalue)
              )'
          );
          prepare stmt from @reorganize_partition_sql;
          execute stmt;
          deallocate prepare stmt;

          -- select concat('partition p_', year_no, '_', lpad(month_no, 2, '0'), ' and updated p_max added successfully.') as message;
          set out_status = 200;
      else
          -- 如果不存在 p_max 分区，则直接添加新分区
          set @add_partition_sql = concat(
              'alter table ', tbl_name, ' add partition (
                  partition p_', year_no, '_', lpad(month_no, 2, '0'), 
                  ' values less than (', 
                  case when month_no = 12 then year_no + 1 else year_no end, ', ', 
                  case when month_no = 12 then 1 else month_no + 1 end, ')
              )'
          );
          prepare stmt from @add_partition_sql;
          execute stmt;
          deallocate prepare stmt;
          
          -- select concat('partition p_', year_no, '_', lpad(month_no, 2, '0'), ' added successfully.') as message;
          set out_status = 200;
      end if;
    end if;
end

上面脚本的执行过程如下：

1）检测待添加的分区是否已存在（已存在则不添加，不存在才添加）

2）检测表中是否存在p_max 分区（检测待切割分区，若存在则切割分区，若不存在这创建分区）

3）切割p_max分区为新分区和新的p_max分区（此处会调整p_max分区的分片策略）

七、批量维护：批量添加表分区

批量给相关表添加表分区，具体脚本如下：

create definer=`root`@`%` procedure `tables_add_monthly_partition`(in tbl_prefix varchar(64), in year_no int, in month_no int,out out_status int)
begin
    declare done int default false;
    declare tbl_name varchar(64);
    declare cur cursor for
        select table_name
        from information_schema.tables
        where table_schema = database() and table_name like concat(tbl_prefix, '%');
    declare continue handler for not found set done = true;

    -- 输出状态，开始执行状态 100，执行成功状态 200，执行失败状态 50
    set out_status = 100;

    -- 打开游标
    open cur;

    read_loop: loop
        fetch cur into tbl_name;
        if done then
            leave read_loop;
        end if;

        -- 检查表是否已经是分区表
        set @is_partitioned = exists (select 1 from information_schema.partitions 
                                      where table_schema = database() and table_name = tbl_name);

        if not @is_partitioned then
            -- 如果表不是分区表，先调用 convert_table_to_partition 进行转换
            call convert_table_to_partition(tbl_name,@status);
            
            -- 转换后再次检查是否成功转换为分区表
            set @is_partitioned = exists (select 1 from information_schema.partitions 
                                          where table_schema = database() and table_name = tbl_name);
            
            if not @is_partitioned then
                -- 如果转换失败，跳过后续操作并输出错误信息
                select concat('failed to convert table ', tbl_name, ' to partitioned. skipping.') as message;
                set out_status = 50;
                iterate read_loop;
            end if;
        end if;

        -- 调用 add_monthly_partitions 为当前表添加分区
        call add_monthly_partitions(tbl_name, year_no, month_no,@status);

        -- 可选：输出操作结果（用于调试）
        -- select concat('processed table: ', tbl_name) as status;
    end loop;

    -- 关闭游标
    close cur;

    -- 输出完成信息
    -- select concat('batch partition addition completed for tables with prefix "', tbl_prefix, '".') as message;
    set out_status = 200;
end

批量添加表分区需要传入相关表前缀，如上面示例中的“ai_result_”，此脚本会将非分区表转换为分区表，再给分区表添加相应的表分区，具体执行过程如下：

1）获取所有相关表

2）遍历判断表是否是分区表

3）非分区表被转换为分区表

4）给分区表添加表分区策略

该脚本请慎重执行，上面我们有常规表和分区表的对比，执行脚本很简单（批量自动完成），但执行的后果请慎重考虑！