文章目录
一、collect_set()/collect_list():
在 hive 中想实现按某字段分组,对另外字段进行合并,可通过collect_list()或者collect_set()实现。
-
collect_set()函数与collect_list()函数:列转行专用函数,都是将分组中的某列转为一个数组返回。有时为了字段拼接效果,多和concat_ws()函数连用。
-
collect_set()与collect_list()的区别:
二、实际运用
1、创建测试表及插入数据 :
drop table test_1;
create table test_1(
id string,
cur_day string,
rule string
)
row format delimited fields terminated by ',';
insert into test_1 values
('a','20230809','501'),('a','20230811','502'),('a','20230812','503'),('a','20230812','501'),('a','20230813','512'),('b','20230809','511'),('b','20230811','512'),('b','20230812','513'),('b','20230812','511'),('b','20230813','512'),('b','20230809','511'),('c','20230811','512'),('c','20230812','513'),('c','20230812','511'),('c','20230813','512');
把同一分组的不同行的数据聚合成一个行
举例1:按照id,cur_day分组,取出每个id对应的所有rule(不去重)。
select id,cur_day,collect_list(rule) as rule_total from test_1 group by id,cur_day order by id,cur_day;
举例2:按照id,cur_day分组,取出每个id对应的所有rule(去重)。
select id,cur_day,collect_set(rule) as rule_total from test_1 group by id,cur_day order by id,cur_day;
- 用下标可以随机取某一个
select id,cur_day,collect_list(rule)[0] as rule_one from test_1 group by id,cur_day order by id,cur_day;
select id,cur_day,collect_set(rule)[0] as rule_one from test_1 group by id,cur_day order by id,cur_day;
- 聚合后的中的值用‘|’分隔开
select id,cur_day,concat_ws('|',collect_list(rule)) as rule_total from test_1 group by id,cur_day order by id,cur_day;
select id,cur_day,concat_ws('|',collect_set(rule)) as rule_total from test_1 group by id,cur_day order by id,cur_day;
例子
- spark-sql : collect_list里边字段起别名.(as等其他方式都用过,都报错.
最后用子查询来解决)
select fencecode,
collect_list(struct(vehicleno, platecolor, entertime, levaetime, trans)) as actinfos
from (
select fence_code as fencecode,
veh_no as vehicleno,
veh_color as platecolor,
enter_time as entertime,
out_time as levaetime,
trans
from mid.ct_fence_into_out_dt where dt = 20230911
) subquery
group by fencecode;
发表评论