Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented May 13, 2023

Why are the changes needed?

Now we check the sparkContext.hadoopConfiguration to determine whether to apply HIVE_DELEGATION_TOKEN

Here is the method to create sparkContext hadoopConguration.
And it will add __spark_hadoop_conf__.xml to hadoop configuration resource.

  /**
   * Return an appropriate (subclass) of Configuration. Creating config can initialize some Hadoop
   * subsystems.
   */
  def newConfiguration(conf: SparkConf): Configuration = {
    val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
    hadoopConf.addResource(SparkHadoopUtil.SPARK_HADOOP_CONF_FILE)
    hadoopConf
  }
  /**
   * Name of the file containing the gateway's Hadoop configuration, to be overlayed on top of the
   * cluster's Hadoop config. It is up to the Spark code launching the application to create
   * this file if it's desired. If the file doesn't exist, it will just be ignored.
   */
  private[spark] val SPARK_HADOOP_CONF_FILE = "__spark_hadoop_conf__.xml"
image

Per the code, this file is only created in yarn module.

Spark on yarn

after unzip __spark_conf__.zip in spark staging dir, there is a file named __spark_hadoop_conf__.xml.

 grep hive.metastore.uris  __spark_hadoop_conf__.xml
<property><name>hive.metastore.uris</name><value>thrift://*******:9083</value><source>programatically</source></property>

Spark on K8S

Seems for spark on k8s, there is no file named __spark_hadoop_conf__.xml
image

image

We need to check the hiveConf instead of hadoopConf.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@turboFei turboFei self-assigned this May 13, 2023
@turboFei turboFei requested review from pan3793 and yaooqinn May 13, 2023 04:54
@turboFei turboFei changed the title Using hive conf to check whether HIVE_DELEGATION_TOKEN is needed [K8S] Using hive conf to check whether HIVE_DELEGATION_TOKEN is needed May 13, 2023
@turboFei turboFei changed the title [K8S] Using hive conf to check whether HIVE_DELEGATION_TOKEN is needed [K8S] Using hive conf to check whether to need HIVE_DELEGATION_TOKEN May 13, 2023
@turboFei turboFei changed the title [K8S] Using hive conf to check whether to need HIVE_DELEGATION_TOKEN [K8S] Using hive conf to check whether to apply HIVE_DELEGATION_TOKEN May 13, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #4835 (7657cbb) into master (1e310a0) will increase coverage by 0.03%.
The diff coverage is 80.00%.

@@             Coverage Diff              @@
##             master    #4835      +/-   ##
============================================
+ Coverage     58.03%   58.07%   +0.03%     
  Complexity       13       13              
============================================
  Files           583      583              
  Lines         32561    32570       +9     
  Branches       4318     4320       +2     
============================================
+ Hits          18897    18914      +17     
+ Misses        11845    11836       -9     
- Partials       1819     1820       +1     
Impacted Files Coverage Δ
...ubi/engine/spark/SparkTBinaryFrontendService.scala 81.63% <80.00%> (-0.39%) ⬇️

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793
Copy link
Member

pan3793 commented May 14, 2023

For spark on k8s, the hive-site.xml is placed to working directory after spark context launched. See details in apache/spark#37417

Sorry, I don't get your point, w/ apache/spark#37417, Spark downloads and adds those files during spark submit phase, which should happen before SparkContext initialization

@turboFei
Copy link
Member Author

turboFei commented May 15, 2023

Not sure why hive-site.xml is not loaded.
image

image

Anyway, I think hiveConf is the correct one to check. @pan3793

@turboFei
Copy link
Member Author

turboFei commented May 15, 2023

  /**
   * Return an appropriate (subclass) of Configuration. Creating config can initialize some Hadoop
   * subsystems.
   */
  def newConfiguration(conf: SparkConf): Configuration = {
    val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
    hadoopConf.addResource(SparkHadoopUtil.SPARK_HADOOP_CONF_FILE)
    hadoopConf
  }
  /**
   * Name of the file containing the gateway's Hadoop configuration, to be overlayed on top of the
   * cluster's Hadoop config. It is up to the Spark code launching the application to create
   * this file if it's desired. If the file doesn't exist, it will just be ignored.
   */
  private[spark] val SPARK_HADOOP_CONF_FILE = "__spark_hadoop_conf__.xml"

Seems for spark on k8s, there is no file named __spark_hadoop_conf__.xml

 grep hive.metastore.uris  __spark_hadoop_conf__.xml
<property><name>hive.metastore.uris</name><value>thrift://*******:9083</value><source>programatically</source></property>

@turboFei
Copy link
Member Author

@pan3793 updated the pr description

@turboFei turboFei closed this in 6b5c138 May 15, 2023
turboFei added a commit that referenced this pull request May 15, 2023
…ELEGATION_TOKEN

### _Why are the changes needed?_

Now we check the sparkContext.hadoopConfiguration to determine whether to apply HIVE_DELEGATION_TOKEN

Here is the method to create sparkContext hadoopConguration.
And it will add `__spark_hadoop_conf__.xml` to hadoop configuration resource.
```
  /**
   * Return an appropriate (subclass) of Configuration. Creating config can initialize some Hadoop
   * subsystems.
   */
  def newConfiguration(conf: SparkConf): Configuration = {
    val hadoopConf = SparkHadoopUtil.newConfiguration(conf)
    hadoopConf.addResource(SparkHadoopUtil.SPARK_HADOOP_CONF_FILE)
    hadoopConf
  }
```

```
  /**
   * Name of the file containing the gateway's Hadoop configuration, to be overlayed on top of the
   * cluster's Hadoop config. It is up to the Spark code launching the application to create
   * this file if it's desired. If the file doesn't exist, it will just be ignored.
   */
  private[spark] val SPARK_HADOOP_CONF_FILE = "__spark_hadoop_conf__.xml"
```
<img width="1091" alt="image" src="https://github.com/apache/kyuubi/assets/6757692/f2a87a23-4565-4164-9eaa-5f7e166519de">

Per the code, this file is only created in yarn module.

#### Spark on yarn
 after unzip `__spark_conf__.zip` in spark staging dir, there is  a file named `__spark_hadoop_conf__.xml`.
```
 grep hive.metastore.uris  __spark_hadoop_conf__.xml
<property><name>hive.metastore.uris</name><value>thrift://*******:9083</value><source>programatically</source></property>
```

#### Spark on K8S
Seems for spark on k8s, there is no file named `__spark_hadoop_conf__.xml`
<img width="1580" alt="image" src="https://github.com/apache/kyuubi/assets/6757692/99de73d0-3519-4af3-8f0a-90967949ec0e">

<img width="875" alt="image" src="https://github.com/apache/kyuubi/assets/6757692/f7c477a5-23ca-4b25-8638-4b040b78899d">

We need to check the `hiveConf` instead of `hadoopConf`.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4835 from turboFei/hive_token.

Closes #4835

7657cbb [fwang12] hive conf
7c0af67 [fwang12] save

Authored-by: fwang12 <[email protected]>
Signed-off-by: fwang12 <[email protected]>
(cherry picked from commit 6b5c138)
Signed-off-by: fwang12 <[email protected]>
@pan3793 pan3793 added this to the v1.7.2 milestone May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants