当前位置: 代码网 > it编程>编程语言>Java > Flink NoSuchFileException: /tmp/flink-netty-shuffle-xxxx/xxxx.channel.shuffle.data异常处理

Flink NoSuchFileException: /tmp/flink-netty-shuffle-xxxx/xxxx.channel.shuffle.data异常处理

2024年07月28日 Java 我要评论
Fllink 批任务运行一段时间后出现如下错误:java.nio.file.NoSuchFileException: /tmp/flink-netty-shuffle-c5222ebc-a7bb-4fa1-bfd2-c7b5c9bd9b67/3740ddaa0f56ec8bcce80927e4a05443.channel.shuffle.data详细信息如下:Flink任务按Batch模式执行时,配置taskmanager.tmp.dirs 不要使用/tmp目录

项目场景和问题描述:

fllink 批任务运行一段时间后出现如下错误:

java.nio.file.nosuchfileexception: /tmp/flink-netty-shuffle-c5222ebc-a7bb-4fa1-bfd2-c7b5c9bd9b67/3740ddaa0f56ec8bcce80927e4a05443.channel.shuffle.data

详细信息如下:

2024-07-18 14:27:54
org.apache.flink.runtime.jobexception: recovery is suppressed by norestartbackofftimestrategy
	at org.apache.flink.runtime.executiongraph.failover.flip1.executionfailurehandler.handlefailure(executionfailurehandler.java:139)
	at org.apache.flink.runtime.executiongraph.failover.flip1.executionfailurehandler.getfailurehandlingresult(executionfailurehandler.java:83)
	at org.apache.flink.runtime.scheduler.defaultscheduler.recordtaskfailure(defaultscheduler.java:256)
	at org.apache.flink.runtime.scheduler.defaultscheduler.handletaskfailure(defaultscheduler.java:247)
	at org.apache.flink.runtime.scheduler.defaultscheduler.ontaskfailed(defaultscheduler.java:240)
	at org.apache.flink.runtime.scheduler.schedulerbase.ontaskexecutionstateupdate(schedulerbase.java:738)
	at org.apache.flink.runtime.scheduler.schedulerbase.updatetaskexecutionstate(schedulerbase.java:715)
	at org.apache.flink.runtime.scheduler.schedulerng.updatetaskexecutionstate(schedulerng.java:78)
	at org.apache.flink.runtime.jobmaster.jobmaster.updatetaskexecutionstate(jobmaster.java:477)
	at sun.reflect.generatedmethodaccessor91.invoke(unknown source)
	at sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)
	at java.lang.reflect.method.invoke(method.java:498)
	at org.apache.flink.runtime.rpc.akka.akkarpcactor.lambda$handlerpcinvocation$1(akkarpcactor.java:309)
	at org.apache.flink.runtime.concurrent.akka.classloadingutils.runwithcontextclassloader(classloadingutils.java:83)
	at org.apache.flink.runtime.rpc.akka.akkarpcactor.handlerpcinvocation(akkarpcactor.java:307)
	at org.apache.flink.runtime.rpc.akka.akkarpcactor.handlerpcmessage(akkarpcactor.java:222)
	at org.apache.flink.runtime.rpc.akka.fencedakkarpcactor.handlerpcmessage(fencedakkarpcactor.java:84)
	at org.apache.flink.runtime.rpc.akka.akkarpcactor.handlemessage(akkarpcactor.java:168)
	at akka.japi.pf.unitcasestatement.apply(casestatements.scala:24)
	at akka.japi.pf.unitcasestatement.apply(casestatements.scala:20)
	at scala.partialfunction.applyorelse(partialfunction.scala:123)
	at scala.partialfunction.applyorelse$(partialfunction.scala:122)
	at akka.japi.pf.unitcasestatement.applyorelse(casestatements.scala:20)
	at scala.partialfunction$orelse.applyorelse(partialfunction.scala:171)
	at scala.partialfunction$orelse.applyorelse(partialfunction.scala:172)
	at scala.partialfunction$orelse.applyorelse(partialfunction.scala:172)
	at akka.actor.actor.aroundreceive(actor.scala:537)
	at akka.actor.actor.aroundreceive$(actor.scala:535)
	at akka.actor.abstractactor.aroundreceive(abstractactor.scala:220)
	at akka.actor.actorcell.receivemessage(actorcell.scala:580)
	at akka.actor.actorcell.invoke(actorcell.scala:548)
	at akka.dispatch.mailbox.processmailbox(mailbox.scala:270)
	at akka.dispatch.mailbox.run(mailbox.scala:231)
	at akka.dispatch.mailbox.exec(mailbox.scala:243)
	at java.util.concurrent.forkjointask.doexec(forkjointask.java:289)
	at java.util.concurrent.forkjoinpool$workqueue.runtask(forkjoinpool.java:1056)
	at java.util.concurrent.forkjoinpool.runworker(forkjoinpool.java:1692)
	at java.util.concurrent.forkjoinworkerthread.run(forkjoinworkerthread.java:157)
caused by: java.io.ioexception: failed to create file writer.
	at org.apache.flink.runtime.io.network.partition.sortmergeresultpartition.setupinternal(sortmergeresultpartition.java:185)
	at org.apache.flink.runtime.io.network.partition.resultpartition.setup(resultpartition.java:161)
	at org.apache.flink.runtime.taskmanager.task.setuppartitionsandgates(task.java:946)
	at org.apache.flink.runtime.taskmanager.task.dorun(task.java:639)
	at org.apache.flink.runtime.taskmanager.task.run(task.java:550)
	at java.lang.thread.run(thread.java:748)
caused by: java.nio.file.nosuchfileexception: /tmp/flink-netty-shuffle-45d4cbd5-f22e-47d0-98ff-7f11dd98a0b7/07391f7b184c5f8b8e234abee6649daa.channel.shuffle.data
	at sun.nio.fs.unixexception.translatetoioexception(unixexception.java:86)
	at sun.nio.fs.unixexception.rethrowasioexception(unixexception.java:102)
	at sun.nio.fs.unixexception.rethrowasioexception(unixexception.java:107)
	at sun.nio.fs.unixfilesystemprovider.newfilechannel(unixfilesystemprovider.java:177)
	at java.nio.channels.filechannel.open(filechannel.java:287)
	at java.nio.channels.filechannel.open(filechannel.java:335)
	at org.apache.flink.runtime.io.network.partition.partitionedfilewriter.openfilechannel(partitionedfilewriter.java:133)
	at org.apache.flink.runtime.io.network.partition.partitionedfilewriter.<init>(partitionedfilewriter.java:121)
	at org.apache.flink.runtime.io.network.partition.sortmergeresultpartition.setupinternal(sortmergeresultpartition.java:182)
	... 5 more


原因分析:

flink任务按batch模式执行时,中间结果数据回进行落地,默认放在/tmp/目录下。因/tmp目录linux系统一般默认每10天进行定期删除。故如果shuffle目录被删除后,再执行任务就会报java.nio.file.nosuchfileexception: /tmp/flink-netty-shuffle-c5222ebc-a7bb-4fa1-bfd2-c7b5c9bd9b67/3740ddaa0f56ec8bcce80927e4a05443.channel.shuffle.data的错误。


解决方案:

taskmanager.tmp.dirs:这个配置项允许你设置 taskmanager 使用的一个或多个临时文件目录列表。flink 会在这些目录之间均匀分配临时文件。

修改flink flink-conf.yaml配置文件,配置taskmanager.tmp.dirs 文件目录,避免使用/tmp目录,如:taskmanager.tmp.dirs: /opt/flink_shuffle/

(0)

相关文章:

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论

验证码:
Copyright © 2017-2025  代码网 保留所有权利. 粤ICP备2024248653号
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com