irpas技术客

大数据问题排查系列 - HIVE踩坑记-HIVE-15642_明哥的IT随笔

网络 2620

?

前言

大家好,我是明哥!

本片博文是“大数据线上问题排查系列”大类别之一,讲述前段时间我司某产品在某券商遇到的一个问题及解决方案,其背后涉及到 hive 的一个 BUG,在 hive 3.0 才修复。以下是正文。

问题现象

cdh6.2.1中,开启 kerberos 和 sentry 的hive 中,使用 dml 语句 insert overwrite 插入数据到分区表的新分区时,会报错。同时注意到:

查看底层 hdfs 发现对应的目录和数据都有了;

使用 show partitions xxx 查看不到对应的新分区;

使用 select 语句也查不到该新分区下的数据;

只要是插入数据到新的分区,不管使用静态分区还是动态分区模式,都会报错;

问题分析

为确定问题,首先尝试了不同场景,发现如下现象:

当使用 insert overwrite 插入数据到分区表的已有分区时,不会报错;(不管该分区下有没有数据);

使用 insert into 插入数据到分区表的新分区时,不会报错;

对于非分区表,insert overwrite 和 insert into 都不会报错;

当使用insert overwrite 插入数据到分区表的新分区报错后,可以使用 msck repair test0317 修复hive metastore中相关元数据,修复后 select, show partitions 等语句可以查询到分区和该分区的底层数据,没有问题;

为进一步分析问题,常看了相关日志,包括客户端日志,hiveserver2 日志和 hive metastore 日志分别如下所示。

客户端日志(beeline) INFO : Loading data to table apollo_ods_jzfix.test0317 partition (ptdate=1) from hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1/.hive-staging_hive_2021-03-17_15-09-13_232_1543365768355672834-7333/-ext-10000 ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event INFO : MapReduce Jobs Launched: INFO : Stage-Stage-1: Map: 1 Cumulative CPU: 2.56 sec HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS INFO : Total MapReduce CPU Time Spent: 2 seconds 560 msec INFO : Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event (state=08S01,code=1) hive metastore 日志 2021-03-17 15:09:30,039 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_table : db=apollo_ods_jzfix tbl=test0317 2021-03-17 15:09:30,044 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-9-thread-122]: 121: get_partition : db=apollo_ods_jzfix tbl=test0317[1] 2021-03-17 15:09:30,044 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_partition : db=apollo_ods_jzfix tbl=test0317[1] 2021-03-17 15:09:30,053 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-122]: NoSuchObjectException(message:partition values=[1]) at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) at com.sun.proxy.$Proxy26.getPartition(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2021-03-17 15:09:30,054 ERROR org.apache.thrift.ProcessFunction: [pool-9-thread-122]: Internal error processing fire_listener_event org.apache.hadoop.hive.metastore.api.NoSuchObjectException: partition values=[1] at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy26.getPartition(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?] at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] hiveserver2 日志 2021-03-17 15:09:30,016 INFO org.apache.hadoop.hive.ql.exec.MoveTask: [HiveServer2-Background-Pool: Thread-183022]: Partition is: {ptdate=1} 2021-03-17 15:09:30,033 INFO org.apache.hadoop.hive.common.FileUtils: [HiveServer2-Background-Pool: Thread-183022]: Creating directory if it doesn't exist: hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1 2021-03-17 15:09:30,057 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: [HiveServer2-Background-Pool: Thread-183022]: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?] at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?] at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] 2021-03-17 15:09:31,063 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Closed a connection to metastore, current connections: 9231 2021-03-17 15:09:31,063 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Trying to connect to metastore with URI thrift://dev-dw-dn01:9083 2021-03-17 15:09:31,065 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Opened a connection to metastore, current connections: 9232 2021-03-17 15:09:31,065 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Connected to metastore. 2021-03-17 15:09:31,196 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_lis tener_event2021-03-17 15:09:31,196 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: MapReduce Jobs Launched: 2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Stage-Stage-1: Map: 1 Cumulative CPU: 2.56 sec HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS 2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Total MapReduce CPU Time Spent: 2 seconds 560 msec 2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds 2021-03-17 15:09:31,206 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-183022]: Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2433) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] ... 11 more Caused by: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?] at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?] at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1] 问题临时解决方法

通过以上不同场景下的尝试和相关日志,可以确定这应该是某些配置项引起的 HIVE 的一个 BUG,但笔者当时并没有发现相关的JIRA。为了快速推进项目,笔者当时提出了以下 workaround:

业务测问题解决方法一:在 insert overwrite 后 (会报错),增加脚本执行命令 msck repair test0317 修复hive metastore 中相关元数据: 该方法的缺点是不太方便嵌入到调度流程中,因insert overwrite报错后,调度工具就不会继续调度执行后续流程了,无法做到业务代码不侵入;

业务测问题解决方法二:在执行正常业务流程前,添加一个前置流程,该前置流程的内容是提前创建好正常流程中需要的新分区,其内容实例如下:alter table test0317 add partition (ptdate=10); (ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec;

最终为快速推进项目,我们采用了上述方案2临时解决了问题。

问题解决方法

在有了临时解决方案,确保了项目能顺利推进的同时,为彻底了解背后技术细节,笔者也在 cloudera 社区发了帖子寻求支持,但社区的相应速度没有那么及时,前几日才收到回馈,截图如下:

JIRA截图

根据 cloudra 官方的回复,参考上述 jira 链接,并对照源码,确认这是 HIVE 的一个BUG,且在 HIVE 3.0 才修复完毕,其主要问题概括如下:

在 hive.metastore.dml.events 配置为 true时,向不存在的分区插入数据,比如向新分区 insert overwrite,或以动态分区模式插入数据到新分区时,hive原先的处理有问题,会因为 metastore 找不到分区而断开链接并报错。

所以,在上述业务侧可以采取的方案二之外,还可以采取如下解决方案:

配置 hive.metastore.dml.events 为false

或升级到hive3.0.

!关注不迷路~ 各种福利、资源定期分享!欢迎小伙伴们扫码添加明哥微信,后台加群交流学习。


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #大数据问题排查系列 #hive #的一个 #BUG #