irpas技术客

presto本地开发,访问Hive报错:Unable to create input format org.apache.hadoop.mapred.TextI

未知 931

1. 序言

出于工作需要,使用mac + IntelliJ IDEA,搭建了一个可以访问Hive的本地开发环境

执行查询时报错:

Query 20220422_084012_00000_34axr failed: Unable to create input format org.apache.hadoop.mapred.TextInputFormat

查看日志,具体的报错信息如下:

com.facebook.presto.spi.PrestoException: Unable to create input format org.apache.hadoop.mapred.TextInputFormat at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:328) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:299) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:272) // 省略部分内容 Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:325) ... 11 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) ... 14 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 19 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) ... 21 more

从报错信息看,应该是java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found导致Unable to create input format org.apache.hadoop.mapred.TextInputFormat

2. 解决办法 2.1 确认缺失的maven依赖 按照之前使用Flink jar任务实现ETL时的经验来看,ClassNotFoundException一般都是因为程序在运行时无法找到对应的类这时候,可以通过IDEA的快捷键查找com.hadoop.compression.lzo.LzoCodec这个类 通过查看这两个类,发现在分别位于lzo-hadoop-1.0.5.jar和hadoop-lzo-0.4.15-cdh5.14.4.jar在presto项目目录下,通过find命令搜索jar包,发现hadoop-lzo-0.4.15-cdh5.14.4.jar在本地磁盘中存在 大胆推测,缺少的应该是hadoop-lzo-0.4.15-cdh5.14.4.jar对应的maven依赖 2.2 解决办法

根据jar包名与maven依赖写法之间的关系,同时结合网上搜索的结果,确定了maven依赖的写法如下:

<dependency> <groupId>com.hadoop.gplcompression</groupId> <artifactId>hadoop-lzo</artifactId> <version>0.4.15-cdh5.14.4</version> <scope>runtime</scope> </dependency>

将上面的依赖添加到presto-hive-hadoop2模块的pom.xml文件中

同时,通过mvn install:install-file将早就存在的hadoop-lzo-0.4.15-cdh5.14.4.jar安装到本地仓库。(这一步非常关键,不要单纯地粘贴复制到对应的仓库目录!!!)

mvn install:install-file -Dfile=/path_to_hadoop_lzo/hadoop-lzo-0.4.15-cdh5.14.4.jar -DgroupId=com.hadoop.gplcompression -DartifactId=hadoop-lzo -Dversion=0.4.15-cdh5.14.4 -Dpackaging=jar

如果没有hadoop-lzo-0.4.15-cdh5.14.4.jar的小伙伴,私聊作者

成功安装后,可以在本地仓库对应的目录下看到如下文件

2.3 重新编译模块,解决问题

进入presto-hive-hadoop2模块的根目录,执行maven命令重新编译该模块

mvn clean install -DskipTests // 从失败的presto-hive-hadoop2模块开始构建 mvn clean install -DskipTests -rf:presto-hive-hadoop2

编译完成后,重新运行Presto服务

先查看运行日志,搜索关键字hadoop-lzo,确认是否成功加载hadoop-lzo-0.4.15-cdh5.14.4.jar

成功加载,然后重新执行出错的查询,查询成功运行

3. 题外话 3.1 如何确定模块?

不要问我怎么知道应该修改这个模块的pom文件,我只想说这是来自同事的提点和自多次尝试后的结果

后面回过头来看,编译后的Presto安装包的plugin目录中存在hadoop-lzo-0.4.15-cdh5.14.4.jar

同时,结合编译时的fileSet配置配置,说明hive-hadoop2目录对应的应该就是presto-hive-hadoop2模块

<fileSet> <directory>${project.build.directory}/dependency/presto-hive-hadoop2-${project.version}</directory> <outputDirectory>plugin/hive-hadoop2</outputDirectory> </fileSet> 3.2 为什么使用runtime作为依赖范围?

最开始时,并未指定scope,默认使用compile类型的scope

编译时报错:

[INFO] --- maven-dependency-plugin:3.1.1:analyze-only (default) @ presto-hive-hadoop2 --- [WARNING] Unused declared dependencies found: [WARNING] com.hadoop.gplcompression:hadoop-lzo:jar:0.4.15-cdh5.14.4:compile

说明,在编译时并不需要该依赖,于是改成了runtime类型的scope】

3.3 为什么使用mvn install:install-file安装jar?

其实通过mvn install:install-file安装jar这部分的内容,是后来才补充的

自己切换分支,重新编译时报错:

[ERROR] Failed to execute goal on project <project_name>: Could not resolve dependencies for project com.hadoop.gplcompression:hadoop-lzo:jar:0.4.15-cdh5.14.4: Failure to find com.hadoop.gplcompression:hadoop-lzo:jar:0.4.15-cdh5.14.4 in http://xxx.com/repository/maven-public/ was cached in the local repository, resolution will not be reattempted until the update interval of fintech has elapsed or updates are forced -> [Help 1]

通过查阅资料,解决maven构建报错,描述的情况和自己一致,存在xxx.lastUpdated文件

于是按照博客中说的方法,将hadoop-lzo-0.4.15-cdh5.14.4.jar.lastUpdated文件删除,直接从其他目录拷贝一个新的jar到本地仓库(注意: 截图有延迟,这是已经更新过文件权限后的样子)

最终,编译不再报错

但是神奇的事情发生了,重新运行Presto服务时,发现竟然没有加载hadoop-lzo-0.4.15-cdh5.14.4.jar

一下子慌了神,甚至通过find命令反复确认,不管是本地仓库还是presto-hive-hadoop模块的target/目录,都是存在hadoop-lzo-0.4.15-cdh5.14.4.jar

然后怀疑是文件权限的问题,于是将所有涉及到的hadoop-lzo-0.4.15-cdh5.14.4.jar都改为0755(-rwxr-xr-x)权限

重新运行Presto服务、重新编译Presto都无济于事

后来仔细对比了线上服务器的本地仓库,发现存在差异

灵机一动,本地仓库的jar包好像不能简单通过拷贝完事儿,需要通过mvn install:install-file进行安装

删除对应的文件目录后,执行mvn install:install-file,发现目录下的内容与线上服务器一致

重新编译presto-hive-hadoop2模块后,再次运行服务,发现能成功加载hadoop-lzo-0.4.15-cdh5.14.4.jar了,查询也不再报错 😄 😄 😄

感谢博客:maven如何将本地jar安装到本地仓库

3.4 Could not load native gpl library错误(未解决, 不影响查询)

虽然查询成功了,但执行日志中却存在Could not load native gpl library的错误

2022-04-22T17:33:30.362+0800 ERROR hive-hive-0 com.hadoop.compression.lzo.GPLNativeCodeLoader Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:325) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:299) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:272) at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:102) at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:201) at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47) at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20) at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35) at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2022-04-22T17:33:30.363+0800 ERROR hive-hive-0 com.hadoop.compression.lzo.LzoCodec Cannot load native-lzo without native-hadoop

参考了线上环境的配置,发现应该是缺少与compression有关的lib

自己尝试过,在jvm.config或者JVM启动参数中,给出相关的lib,但是依旧没有解决问题

-Djava.library.path=/path_to/hadoop-gpl-compression

还参考博客:Hadoop和Spark报com.hadoop.compression.lzo.LzoCodec not found错误集锦,添加了libhadoop、libsnappy有关的lib,也没有成功

其他参考博客,也讲得差不多都是这些lib

pyspark ERROR lzo.GPLNativeCodeLoader: Could not load native gpl libraryPresto的坑记录


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #presto本地开发 #访问Hive报错Unable #To #create #input #format #1