当前位置: 代码网 > it编程>编程语言>其他编程 > 关于使用SQOOP抽数到Hive遇到的问题

关于使用SQOOP抽数到Hive遇到的问题

2024年05月26日 其他编程 我要评论
使用sqoop抽数到hive遇到问题前置条件1.源端数据库类型为mysql2.目标端是hive库,beeline为1.1.03.hive建表时使用了分桶,并且加入了stored as orc参数,之前

使用sqoop抽数到hive遇到问题

前置条件

1.源端数据库类型为mysql

2.目标端是hive库,beeline为1.1.0

3.hive建表时使用了分桶,并且加入了stored as orc参数,之前这么创建是为了能够实现delete等操作

处理过程

最初想要在sqoop中使用hcatalog直接建表+抽数据,语句是这样写的:

sqoop import --connect jdbc:mysql://xx.xx.xx.xx:19100/pms_scyw --username root --password ********\
--table t_sb_zwyc_xl --null-string '\\n' --null-non-string '\\n' \
--create-hcatalog-table \
--hcatalog-database test1 \
--hcatalog-table t_sb_zwyc_xl_521 \
--hcatalog-storage-stanza "clustered by (obj_id) into 16 buckets stored as orc tblproperties('transactional'='true')"
#查到的资料说hcatalog-storage-stanza参数会在建表的时候自动加入到create语句之后,所以这里这样写

满心期待说能直接把工作做完了,结果就报错了,报错结果如下:

19/05/21 10:03:57 info hcat.sqoophcatutilities: executing external hcatalog cli process with args :-f,/tmp/hcat-script-1558404237184
19/05/21 10:04:00 info hcat.sqoophcatutilities: failed: illegalargumentexception decimal precision out of allowed range [1,38]
19/05/21 10:04:00 error tool.importtool: encountered ioexception running import job: java.io.ioexception: hcat exited with status 64
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.executeexternalhcatprogram(sqoophcatutilities.java:1148)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.launchhcatcli(sqoophcatutilities.java:1097)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.createhcattable(sqoophcatutilities.java:644)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.configurehcat(sqoophcatutilities.java:340)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.configureimportoutputformat(sqoophcatutilities.java:802)
at org.apache.sqoop.mapreduce.importjobbase.configureoutputformat(importjobbase.java:98)
at org.apache.sqoop.mapreduce.importjobbase.runimport(importjobbase.java:259)
at org.apache.sqoop.manager.sqlmanager.importtable(sqlmanager.java:692)
at org.apache.sqoop.manager.mysqlmanager.importtable(mysqlmanager.java:118)
at org.apache.sqoop.tool.importtool.importtable(importtool.java:497)
at org.apache.sqoop.tool.importtool.run(importtool.java:605)
at org.apache.sqoop.sqoop.run(sqoop.java:143)
at org.apache.hadoop.util.toolrunner.run(toolrunner.java:70)
at org.apache.sqoop.sqoop.runsqoop(sqoop.java:179)
at org.apache.sqoop.sqoop.runtool(sqoop.java:218)
at org.apache.sqoop.sqoop.runtool(sqoop.java:227)
at org.apache.sqoop.sqoop.main(sqoop.java:236)

看信息似乎是精度超过了限制,怎么会呢,随即看了看sqoop生成的建表语句,找到端倪了:

    `dqtz` varchar(6),
    `dszll` decimal(12,4),
    `dxmpyxkid` varchar(42),
    `dycs` decimal(65),

可以看到有个字段的类型是decimal(65),允许范围只有1-38,这显然超了,这咋办啊,只有求助谷歌大法了,查来查去也只发现这一个帖子:

帖子链接https://kb.informatica.com/solution/23/pages/64/524288.aspx

原帖给了一种解决方法

solution
to resolve this issue, we need to ensure that the precision and the scale of the number datatype changes from 0 to a valid value.
to achieve this, add the following arguments in the jdbc parameters of the oracle connection.
catalogoptions=0;numberprecisionscale=1
add above to the connection string of the jdbc connection used for metadata access.
after modifying the jdbc connection string, re-import the metadata and verify if the right precision is getting imported.

意思就是说在jdbc串后面加上“catalogoptions=0;numberprecisionscale=1”参数,可人家用的源库是oracle,我这是mysql,给的方法也不顶用,就很头大,没办法,只有花点时间自己建表然后把数据导进来了。

捯饬了一段时间,表建好了,ok,再抽一下试试看,这次语句变了下:

sqoop import  --connect jdbc:mysql://xx.xx.xx.xx:19100/pms_scyw --username root --password ********\
 --table t_sb_zwyc_xl --null-string '\\n' --null-non-string '\\n'  \
 --fields-terminated-by '~@!'   -m 2 \
 --hive-overwrite \
 --hcatalog-database test1 \
 --hcatalog-table t_sb_zwyc_xl_ljd \
 --verbose ; 

结果又报错了,好在不是和上次同样的错误,看看是什么问题:

error tool.importtool: encountered ioexception running import job: org.apache.hive.hcatalog.common.hcatexception : 2016 : error operation not supported : store into a partition with bucket definition from pig/mapreduce is not supported
at org.apache.hive.hcatalog.mapreduce.hcatoutputformat.setoutput(hcatoutputformat.java:109)
at org.apache.hive.hcatalog.mapreduce.hcatoutputformat.setoutput(hcatoutputformat.java:70)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.configurehcat(sqoophcatutilities.java:346)
at org.apache.sqoop.mapreduce.hcat.sqoophcatutilities.configureimportoutputformat(sqoophcatutilities.java:768)
at org.apache.sqoop.mapreduce.importjobbase.configureoutputformat(importjobbase.java:98)
at org.apache.sqoop.mapreduce.importjobbase.runimport(importjobbase.java:249)
at org.apache.sqoop.manager.sqlmanager.importtable(sqlmanager.java:665)

舒服了,这次直接是说不支持分桶格式了,查了下,貌似sqoop暂时还是不支持分桶的表,但是可以通过临时表来进行一下数据的中转,那就先建个临时表,不过这个临时表不加“clustered by (obj_id) into 16 buckets”参数,只在create语句后加上stored as orc tblproperties(‘transactional’=‘true’) 就行了,随后就是先把数据抽进临时表,在从临时表insert到目标表去就可以了。

一通操作做完以后验证一下表内的数据也是ok的。

虽然目的是达到了,但是发现了一个问题,在目标表中进行查询时,select后面加了limit的话,hive服务会直接宕机掉,日志也没有error级别的,很奇怪,只能后面再找找原因

总结

以上为个人经验,希望能给大家一个参考,也希望大家多多支持代码网。

(0)

相关文章:

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论

验证码:
Copyright © 2017-2026  代码网 保留所有权利. 粤ICP备2024248653号
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com