自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

大數(shù)據(jù)集群開啟 kerberos 認(rèn)證后 Hive 作業(yè)執(zhí)行失敗

大數(shù)據(jù)
本文是大數(shù)據(jù)問題排查系列 的 kerberos問題排查子序列博文之一,講述大數(shù)據(jù)集群開啟 kerberos 安全認(rèn)證后,hive作業(yè)執(zhí)行失敗的根本原因,解決方法與背后的原理和機制。 以下是正文。

[[421858]]

本文轉(zhuǎn)載自微信公眾號「明哥的IT隨筆」,作者IT明哥。轉(zhuǎn)載本文請聯(lián)系明哥的IT隨筆公眾號。

1 前言

大家好,我是明哥!

本文是大數(shù)據(jù)問題排查系列 的 kerberos問題排查子序列博文之一,講述大數(shù)據(jù)集群開啟 kerberos 安全認(rèn)證后,hive作業(yè)執(zhí)行失敗的根本原因,解決方法與背后的原理和機制。 以下是正文。

2 問題現(xiàn)象

大數(shù)據(jù)集群開啟 kerberos 安全認(rèn)證后,HIVE ON SPARK 作業(yè)執(zhí)行失敗。通過客戶端 beeline 提交作業(yè),報錯 spark client 創(chuàng)建失敗,其報錯信息是:

  1. Failed to create spark client for spark session xxx: java.util.concurrent.TimeoutException: client xxx timedout waiting for connection from the remote spark driver 

或者是:

  1. Failed to create spark client for spark session xxx: java.lang.RuntimeException: spark-submit 

客戶端 beeline 的報錯信息截圖如下圖所示:

error-msg-beeline1

error-msg-beeline2

3 問題分析

按照問題排查的常規(guī)思路,我們首先查看 hiveserver2 的日志,能發(fā)現(xiàn)核心報錯信息 “Error while waiting for Remote Spark Driver to connect back to HiveServer2”,hiveserver2 的完整相關(guān)日志如下所示:

  1. 2021-09-02 11:01:29,496 ERROR org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-135]: Error while waiting for Remote Spark Driver to connect back to HiveServer2. 
  2. java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  3.     at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final] 
  4.     at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  5.     at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  6.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  7.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  8.     at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  9.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  10.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  11.     at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  12.     at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:122) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  13.     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  14.     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  15.     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  16.     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  17.     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  18.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  19.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  20.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  21.     at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  22.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  23.     at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201] 
  24.     at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_201] 
  25.     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.3.2.jar:?] 
  26.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  27.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201] 
  28.     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201] 
  29.     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] 
  30.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] 
  31.     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] 
  32. Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  33.     at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  34.     ... 1 more 
  35. 2021-09-02 11:01:29,505 ERROR org.apache.hadoop.hive.ql.exec.spark.SparkTask: [HiveServer2-Background-Pool: Thread-135]: Failed to execute Spark task "Stage-1" 
  36. org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session f43a158c-168a-4117-8993-8f1780913715_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  37.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:286) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  38.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:135) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  39.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  40.     at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  41.     at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:122) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  42.     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  43.     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  44.     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  45.     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  46.     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  47.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  48.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  49.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  50.     at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  51.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  52.     at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201] 
  53.     at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_201] 
  54.     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.3.2.jar:?] 
  55.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  56.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201] 
  57.     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201] 
  58.     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] 
  59.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] 
  60.     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] 
  61. Caused by: java.lang.RuntimeException: Error while waiting for Remote Spark Driver to connect back to HiveServer2. 
  62.     at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:124) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  63.     at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  64.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  65.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  66.     at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  67.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  68.     ... 22 more 
  69. Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  70.     at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final] 
  71.     at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  72.     at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  73.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  74.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  75.     at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  76.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  77.     ... 22 more 
  78. Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  79.     at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  80.     ... 1 more 
  81. 2021-09-02 11:01:29,506 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-135]: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session f43a158c-168a-4117-8993-8f1780913715_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  82. 2021-09-02 11:01:29,507 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-135]: Completed executing command(queryId=hive_20210902110125_ca2ab819-fb9c-4540-8690-2a1ed303186d); Time taken: 3.722 seconds 
  83. 2021-09-02 11:01:29,526 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-135]: Error running hive query:  
  84. org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session f43a158c-168a-4117-8993-8f1780913715_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  85.     at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  86.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  87.     at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  88.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  89.     at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201] 
  90.     at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_201] 
  91.     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.3.2.jar:?] 
  92.     at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  93.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201] 
  94.     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201] 
  95.     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] 
  96.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] 
  97.     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] 
  98. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session f43a158c-168a-4117-8993-8f1780913715_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  99.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:286) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  100.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:135) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  101.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  102.     at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  103.     at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:122) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  104.     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  105.     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  106.     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  107.     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  108.     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  109.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  110.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  111.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  112.     ... 11 more 
  113. Caused by: java.lang.RuntimeException: Error while waiting for Remote Spark Driver to connect back to HiveServer2. 
  114.     at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:124) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  115.     at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  116.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  117.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  118.     at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  119.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  120.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  121.     at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  122.     at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:122) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  123.     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  124.     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  125.     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  126.     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  127.     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  128.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  129.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  130.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  131.     ... 11 more 
  132. Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  133.     at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41) ~[netty-common-4.1.17.Final.jar:4.1.17.Final] 
  134.     at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:103) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  135.     at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:90) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  136.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:104) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  137.     at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:100) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  138.     at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:77) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  139.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  140.     at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:132) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  141.     at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  142.     at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:122) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  143.     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  144.     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  145.     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  146.     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  147.     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  148.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  149.     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  150.     at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  151.     ... 11 more 
  152. Caused by: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ? 
  153.     at org.apache.hive.spark.client.SparkClientImpl$2.run(SparkClientImpl.java:495) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2] 
  154.     ... 1 more 
  155. 2021-09-02 11:01:29,552 INFO  org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Handler-Pool: Thread-128]: Using the default value passed in for log id: f43a158c-168a-4117-8993-8f1780913715 

但是因為 “Error while waiting for Remote Spark Driver to connect back to HiveServer2” 造成 “Failed to create spark client for spark session xxx:” 的根本原因,相對難發(fā)現(xiàn),因為在hiveserver2的日志,hive on spark 作業(yè)的日志 (通過yarn logs -applicationId xx 查看),甚至yarn的日志中,都找不到明顯的相關(guān)信息;

4 問題原因

進(jìn)一步排查問題,需要 在理解作業(yè)的底層執(zhí)行機制的基礎(chǔ)上, 大膽猜想,小心求證。

HIVE 作業(yè)的執(zhí)行機制如下:

  • 終端業(yè)務(wù)用戶比如 xyz 提交給 HIVESERVER2 的 SQL作業(yè),經(jīng)過 HIVESERVER2 的解析編譯和優(yōu)化后,一般會生成 MR/TEZ/SPARK 任務(wù)(之所以說一般,是因為有的 SQL 是直接在HIVESERVER2中執(zhí)行的,不會生成分布式的 MR/TEZ/SPARK 任務(wù)),這些 MR/TEZ/SPARK 任務(wù)最終訪問底層的基礎(chǔ)設(shè)施 HDFS 和 YARN 時,一樣要經(jīng)過 kerberos 安全認(rèn)證;
  • 當(dāng)啟用了 HIVE 的代理機制時(hive.server.enable.doAs=true),業(yè)務(wù)終端用戶如 xyz 提交的 HIVE SQL 作業(yè)底層的 MR/TEZ/SPARK 任務(wù)訪問 HDFS/YARN 時,HDFS/YARN 驗證的是業(yè)務(wù)終端用戶 xyz 的身份 (后續(xù) HDFS/YARN 的權(quán)限校驗,校驗的也是 xyz 用戶的權(quán)限);
  • 當(dāng)沒有啟用 HIVE 的代理機制時(hive.server.enable.doAs=false),業(yè)務(wù)終端用戶提交的 HIVE SQL 作業(yè)底層的 MR/TEZ/SPARK 任務(wù)訪問 HDFS/YARN 時,需要驗證的是 hiveserver2 服務(wù)對應(yīng)的用戶,即 hive 的身份 (后續(xù) HDFS/YARN 的權(quán)限校驗,校驗的也是 hive 用戶的權(quán)限);

至此問題就比較清晰了:

  • 在上述集群環(huán)境中,cdh 集群管理員開啟了 kerberos 安全認(rèn)證,即集群中 hdfs/yarn/hive/spark/kafka 等服務(wù)的使用,都需要經(jīng)過kerberso 安全認(rèn)證;
  • 當(dāng) hiveserver2 執(zhí)行業(yè)務(wù)用戶提交的 sql 作業(yè)時,由于業(yè)務(wù)用戶配置了使用 spark 執(zhí)行引擎,所以 hiveserver2 需要首先為業(yè)務(wù)用戶用戶創(chuàng)建 spark 集群;
  • 在上述集群環(huán)境中,cdh 集群管理員開啟了 hive.server.enable.doAs=true,所以 hiveserver2 創(chuàng)建 spark集群時,spark 集群的 driver 向 yarn 申請資源時,yarn 校驗的是 xyz 的身份;
  • 由于 hiveserver2 沒有提供一致機制將業(yè)務(wù)用戶 xyz的 principal 和對應(yīng)的 keytab 透傳到 yarn, 所以 yarn 對 xyz 的用戶認(rèn)證失敗,沒有相應(yīng)其資源請求,從而 spark driver 因獲取不到 yarn資源無法成功啟動,自然也就不會回連到spark driver的客戶端即hiveserver2,所以才有相關(guān)報錯:"Failed to create spark client for spark session xxx“,“Error while waiting for Remote Spark Driver to connect back to HiveServer2”等。

事實上,細(xì)心的小伙伴,能在 hiveserver2 的日志中,查看到 securityManager 驗證用戶身份的相關(guān)日志:

同樣的,細(xì)心的小伙伴,能在 hiveserver2 的日志中,查看到 hive 啟動 spark on yarn 集群的相關(guān)日志:

  1. 2021-09-02 14:19:10,067 INFO  org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-110]: Running client driver with argv: kinit hive/uf30-1@CDH.COM -k -t hive.keytab; /opt/cloudera/parcels/CDH-6.3.2-1 
  2. .cdh6.3.2.p0.1605554/lib/spark/bin/spark-submit --executor-cores 4 --executor-memory 6442450944b --proxy-user dap --jars /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar --propert 
  3. ies-file /tmp/spark-submit.7174671910364719325.properties --class org.apache.hive.spark.client.RemoteDriver /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar --remote-host uf30-1 --remote-port 
  4.  39677 --remote-driver-conf hive.spark.client.future.timeout=60000 --remote-driver-conf hive.spark.client.connect.timeout=1000 --remote-driver-conf hive.spark.client.server.connect.timeout=90000 --remote-driver-conf hive.spark.cli 
  5. ent.channel.log.level=null --remote-driver-conf hive.spark.client.rpc.max.size=52428800 --remote-driver-conf hive.spark.client.rpc.threads=8 --remote-driver-conf hive.spark.client.secret.bits=256 --remote-driver-conf hive.spark.cl 
  6. ient.rpc.server.address=null --remote-driver-conf hive.spark.client.rpc.server.port=null 

5 問題解決

知道了問題的根本原因,問題的解決也就順理成章了。 有兩個解決辦法:

  • 關(guān)閉集群的 kerberos 安全認(rèn)證,此時向 yarn 申請資源時,yarn 不再需要驗證用戶的身份,hive sql 作業(yè)不管再底層用什么身份執(zhí)行,都不會有用戶身份認(rèn)證問題;(當(dāng)然,用戶權(quán)限問題是另一回事);
  • 保留集群的 kerberos 安全認(rèn)證,但關(guān)閉 hive的代理功能,即hive.server2.enable.doAs=false:此時 hive 可以使用各種認(rèn)證方式(hive.server2.authentication= none/ldap/kerberos), 各個業(yè)務(wù)用戶正常提交 HIVE SQL 作業(yè)給 HIVESERVER2 并可配置使用 MR/TEZ/SPARK 任一執(zhí)行引擎,HIVESERVER2 經(jīng)解析編譯優(yōu)化生成 MR/TEZ/SPARK任務(wù)后,會以 hive 用戶身份跟 yarn/hdfs 進(jìn)行交互和身份認(rèn)證,由于集群已經(jīng)配好了 HIVE 用戶的相關(guān)配置(其實底層是在 hive-site.xml 中配置好了 hive 這個用戶的 principal 和對應(yīng)的keytab文件,所以 hive 用戶跟 hdfs/yarn的交互和認(rèn)證都沒有問題),所以此時 hivesql 作業(yè)可以提交執(zhí)行。

6 知識總結(jié)

 

  • hive 可以配置使用各種認(rèn)證方式 (hive.server2.authentication= none/ldap/kerberos);
  • hive 可以配置使用各種執(zhí)行引擎 (hive.execution.engine= mr/tez/spark);
  • hive 有代理功能,可以開啟也可以關(guān)閉:hive.server2.enable.doAs=false/TRUE,"Setting this property to true will have HiveServer2 execute Hive operations as the user making the calls to it." (一些安全插件如 SENTRY/RANGER 要求關(guān)閉該功能);
  • 終端業(yè)務(wù)用戶比如 xyz 提交給 HIVESERVER2 的 SQL作業(yè),經(jīng)過 HIVESERVER2 的解析編譯和優(yōu)化后,一般會生成 MR/TEZ/SPARK 任務(wù)(之所以說一般,是因為有的 SQL 是直接在HIVESERVER2中執(zhí)行的,不會生成分布式的 MR/TEZ/SPARK 任務(wù)),這些 MR/TEZ/SPARK 任務(wù)最終訪問底層的基礎(chǔ)設(shè)施 HDFS 和 YARN 時,一樣要經(jīng)過這些基礎(chǔ)設(shè)施 hdfs/yarn的 安全認(rèn)證;
  • 當(dāng)啟用了 HIVE 的代理機制時(hive.server.enable.doAs=true),業(yè)務(wù)終端用戶如 xyz 提交的 HIVE SQL 作業(yè)底層的 MR/TEZ/SPARK 任務(wù)訪問 HDFS/YARN 時,HDFS/YARN 驗證的是業(yè)務(wù)終端用戶 xyz 的身份 (后續(xù) HDFS/YARN 的權(quán)限校驗,校驗的也是 xyz 用戶的權(quán)限);
  • 當(dāng)沒有啟用 HIVE 的代理機制時(hive.server.enable.doAs=false),業(yè)務(wù)終端用戶提交的 HIVE SQL 作業(yè)底層的 MR/TEZ/SPARK 任務(wù)訪問 HDFS/YARN 時,需要驗證的是 hiveserver2 服務(wù)對應(yīng)的用戶,即 hive 的身份 (后續(xù) HDFS/YARN 的權(quán)限校驗,校驗的也是 hive 用戶的權(quán)限);
  • 當(dāng)我們說啟用大數(shù)據(jù)集群的 kerberos 安全認(rèn)證,一般是整個集群層面的各個服務(wù),都啟用 kerberos 安全認(rèn)證:因為當(dāng)?shù)讓拥幕A(chǔ)設(shè)施 hdfs/yarn 啟用 kerberos 安全認(rèn)證后,任何和 hdfs/yarn 交互的組件,都需要經(jīng)過 kerberos 安全認(rèn)證;
  • 發(fā)行版的大數(shù)據(jù)集群如 CDH 一般都已經(jīng)配好了 HIVE 用戶的相關(guān) kerberos 安全配置,其實底層是在 hive-site.xml 中配置好了 hive 這個用戶的 principal 和對應(yīng)的keytab文件,所以 hive 用戶跟 hdfs/yarn的交互和認(rèn)證都沒有問題;

 

責(zé)任編輯:武曉燕 來源: 明哥的IT隨筆
相關(guān)推薦

2021-09-03 18:36:50

大數(shù)據(jù)

2014-01-22 15:27:13

大數(shù)據(jù)

2013-07-30 11:23:34

SAP開啟大數(shù)據(jù)城市之

2014-05-09 17:26:54

數(shù)據(jù)化CRMCRM

2017-01-11 16:57:51

大數(shù)據(jù)大數(shù)據(jù)集群監(jiān)控

2017-08-02 14:31:58

大數(shù)據(jù)集群數(shù)據(jù)存儲

2016-01-20 12:07:49

阿里云云棲大會大數(shù)據(jù)

2012-11-08 10:09:57

大數(shù)據(jù)HIVE

2016-10-12 18:58:15

大數(shù)據(jù)PIGHive

2010-05-11 17:50:51

Kerberos認(rèn)證協(xié)

2013-05-31 10:22:12

2020-12-01 09:37:00

點鏈大數(shù)據(jù)中心

2016-05-11 15:02:31

大數(shù)據(jù)

2021-04-15 21:56:14

大數(shù)據(jù)技術(shù)數(shù)據(jù)分析

2023-05-06 07:20:27

HiveDDL管理表

2015-04-24 11:20:15

Hadoop大數(shù)據(jù)架構(gòu)大數(shù)據(jù)

2023-05-08 00:08:51

Hive機制場景

2023-05-06 07:15:59

Hive內(nèi)置函數(shù)工具

2014-05-28 13:50:17

微軟

2017-12-20 15:25:51

數(shù)據(jù)分析大數(shù)據(jù)企業(yè)
點贊
收藏

51CTO技術(shù)棧公眾號