mysql
两个表如下:
mysql> select * from t1 ; +----+--------+ | id | name | +----+--------+ | 2 | tim | | 3 | hannah | | 4 | samuel | | 1 | jacob | +----+--------+ mysql> select * from t2 ; +----+--------+ | id | name | +----+--------+ | 1 | hello | | 2 | jacob | | 3 | hi | | 4 | hannah | +----+--------+
inner join
inner join 产生的结果集,是t1和t2的交集。
mysql> select * from t1
-> inner join t2
-> on t1.name=t2.name;
+----+--------+----+--------+
| id | name | id | name |
+----+--------+----+--------+
| 1 | jacob | 2 | jacob |
| 3 | hannah | 4 | hannah |
+----+--------+----+--------+
join
join和inner join 结果貌似一致。
join 直接将2表进行笛卡尔积。
mysql> select *
-> from t1
-> join t2;
+----+--------+----+--------+
| id | name | id | name |
+----+--------+----+--------+
| 1 | jacob | 1 | hello |
| 2 | tim | 1 | hello |
| 3 | hannah | 1 | hello |
| 4 | samuel | 1 | hello |
| 1 | jacob | 2 | jacob |
| 2 | tim | 2 | jacob |
| 3 | hannah | 2 | jacob |
| 4 | samuel | 2 | jacob |
| 1 | jacob | 3 | hi |
| 2 | tim | 3 | hi |
| 3 | hannah | 3 | hi |
| 4 | samuel | 3 | hi |
| 1 | jacob | 4 | hannah |
| 2 | tim | 4 | hannah |
| 3 | hannah | 4 | hannah |
| 4 | samuel | 4 | hannah |
+----+--------+----+--------+
full [outer] join
实际上,mysql从来都不支持 full outer join,mysql根本就不识别outer关键字。
hive才支持
只有如下这一种情况,full join 可以执行,加上on条件,就会出现语法报错。
mysql> select *
-> from t1
-> full join t2;
+----+--------+----+--------+
| id | name | id | name |
+----+--------+----+--------+
| 1 | jacob | 1 | hello |
| 2 | tim | 1 | hello |
| 3 | hannah | 1 | hello |
| 4 | samuel | 1 | hello |
| 1 | jacob | 2 | jacob |
| 2 | tim | 2 | jacob |
| 3 | hannah | 2 | jacob |
| 4 | samuel | 2 | jacob |
| 1 | jacob | 3 | hi |
| 2 | tim | 3 | hi |
| 3 | hannah | 3 | hi |
| 4 | samuel | 3 | hi |
| 1 | jacob | 4 | hannah |
| 2 | tim | 4 | hannah |
| 3 | hannah | 4 | hannah |
| 4 | samuel | 4 | hannah |
+----+--------+----+--------+
left [outer] join
left outer join 产生表 t1 的完全集,而 t2 表中匹配的则有值,没有匹配的则以null值取代。
mysql> select *
-> from t1
-> left join t2
-> on t1.name=t2.name;
+----+--------+------+--------+
| id | name | id | name |
+----+--------+------+--------+
| 1 | jacob | 2 | jacob |
| 2 | tim | null | null |
| 3 | hannah | 4 | hannah |
| 4 | samuel | null | null |
+----+--------+------+--------+
right [outer] join
right join 和 left join 类似
mysql> select *
-> from t1
-> right outer join t2
-> on t1.name=t2.name;
+------+--------+----+--------+
| id | name | id | name |
+------+--------+----+--------+
| null | null | 1 | hello |
| 1 | jacob | 2 | jacob |
| null | null | 3 | hi |
| 3 | hannah | 4 | hannah |
+------+--------+----+--------+
union 与 union all
union 操作符用于合并两个或多个 select 语句的结果集。
mysql> select name from t1
-> union
-> select name from t2;
+--------+
| name |
+--------+
| jacob |
| tim |
| hannah |
| samuel |
| hello |
| hi |
+--------+
本来有8个名字,但有2个重复,所以去除掉重复的名字,结果只有6个。
mysql> select id,name from t1
-> union
-> select id,name from t2;
+----+--------+
| id | name |
+----+--------+
| 1 | jacob |
| 2 | tim |
| 3 | hannah |
| 4 | samuel |
| 1 | hello |
| 2 | jacob |
| 3 | hi |
| 4 | hannah |
+----+--------+
id和name作为一个整体,没有重复的,所以结果显示8条。
mysql> select name from t1
-> union all
-> select name from t2;
+--------+
| name |
+--------+
| jacob |
| tim |
| hannah |
| samuel |
| hello |
| jacob |
| hi |
| hannah |
+--------+
union all,将结果无脑堆在一起。(相当于两个查询语句)
注意:union 内部的 select 语句必须拥有相同数量的列。列也必须拥有相似的数据类型。同时,每条 select 语句中的列的顺序必须相同。union 只选取记录,而union all会列出所有记录。
hive
hive里面使用join时注意:
1、 只支持等值链接;多个条件,支持 and,不支持 or /大于/小于
例如:
select a.* from a join b on (a.id = b.id) select a.* from a join b on (a.id = b.id and a.department = b.department) 是正确的; 然而: select a.* from a join b on (a.id>b.id)是错误的。
2、 可以 join 多于 2 个表
例如:
select a.val, b.val, c.val from a join b on (a.key = b.key1) join c on (c.key = b.key2)
hive的join分成三类:
1、内连接 inner join 2、外连接 left outer join right outer join full outer join 3、半连接 left semi join
下面做实验:
2个表如下:
hive> select * from tablea; ok tablea.id tablea.name 1 huangbo 2 xuzheng 4 wangbaoqiang 6 huangxiaoming 7 fengjie 10 liudehua hive> select * from tableb; ok tableb.id tableb.age 2 20 4 50 7 80 10 22 12 33 15 44
inner join(内连接)(把符合两边连接条件的数据查询出来)
hive> select * from tablea a inner join tableb b on a.id=b.id; ok a.id a.name b.id b.age 2 xuzheng 2 20 4 wangbaoqiang 4 50 7 fengjie 7 80 10 liudehua 10 22
left join(左连接,等同于 left outer join)
- 1、以左表数据为匹配标准,左大右小
- 2、匹配不上的就是 null
- 3、返回的数据条数与左表相同
hive> select * from tablea a left join tableb b on a.id=b.id; ok a.id a.name b.id b.age 1 huangbo null null 2 xuzheng 2 20 4 wangbaoqiang 4 50 6 huangxiaoming null null 7 fengjie 7 80 10 liudehua 10 22
right join(右连接,等同于 right outer join)
- 1、以右表数据为匹配标准,左小右大
- 2、匹配不上的就是 null
- 3、返回的数据条数与右表相同
hive> select * from tablea a right join tableb b on a.id=b.id; ok a.id a.name b.id b.age 2 xuzheng 2 20 4 wangbaoqiang 4 50 7 fengjie 7 80 10 liudehua 10 22 null null 12 33 null null 15 44
left semi join(左半连接)
因为 hive 不支持 in/exists 操作(1.2.1 版本的 hive 支持 in 的操作),所以用该操作实现,并且是 in/exists 的高效实现。
hive> select * from tablea a left semi join tableb b on a.id=b.id; ok a.id a.name 2 xuzheng 4 wangbaoqiang 7 fengjie 10 liudehua
可以修改为 in 的那种写法:
hive> select * from tablea where a.id in(select b.id from b); ok tablea.id tablea.name 2 xuzheng 4 wangbaoqiang 7 fengjie 10 liudehua
虽然用in 也能实现,但不建议使用。一般用 left semi join 实现。
full outer join(完全外链接)
full outer join 产生a和b的并集。但是需要注意的是,对于没有匹配的记录,则会以null做为值。
hive> select * from tablea a full outer join tableb b on a.id=b.id; ok a.id a.name b.id b.age 1 huangbo null null 2 xuzheng 2 20 4 wangbaoqiang 4 50 6 huangxiaoming null null 7 fengjie 7 80 10 liudehua 10 22 null null 12 33 null null 15 44
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持代码网。
发表评论