-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46237][SQL][TESTS] Make HiveDDLSuite independently testable
#44153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString | ||
| spark.sparkContext.allAddedJars.keys.find(_.contains(jarName)) | ||
| .foreach(spark.sparkContext.addedJars("default").remove) | ||
| .foreach(k => spark.sparkContext.addedJars.get("default") match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this logic was added in #41495. If I'm not mistaken, this is a defensive deletion to avoid including TestUDTF.jar in default during testing? However, I found that running only this test:
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
throws an error:
[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info] java.util.NoSuchElementException: key not found: default
[info] at scala.collection.MapOps.default(Map.scala:274)
[info] at scala.collection.MapOps.default$(Map.scala:273)
[info] at scala.collection.AbstractMap.default(Map.scala:405)
[info] at scala.collection.MapOps.apply(Map.scala:176)
[info] at scala.collection.MapOps.apply$(Map.scala:175)
[info] at scala.collection.AbstractMap.apply(Map.scala:405)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info] at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
I manually printed the contents of spark.sparkContext.addedJars and it's an empty Map.
But when I execute:
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
all tests pass, and the content of spark.sparkContext.addedJars is:
Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
In GitHub Action tests, SQLQuerySuite does indeed execute before HiveDDLSuite, so the failure is not reproduced.
So in this PR, I added a case match to only execute remove when default is not None.
I didn't directly clear all the contents in spark.sparkContext.allAddedJars after SQLQuerySuite in this PR, because I'm not sure about the impact of this behavior.
Do you think this is ok? Or do you have any better suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change seems fine. Can we do as below though?
foreach(k => spark.sparkContext.addedJars.get("default").foreach(_.remove(k)))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine to me ~
HiveDDLSuiteHiveDDLSuite independently testable
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Merged to master. |
|
Thanks @dongjoon-hyun @HyukjinKwon |
### What changes were proposed in this pull request?
When I test `HiveDDLSuite` with
```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
This test throws an error:
```
[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info] java.util.NoSuchElementException: key not found: default
[info] at scala.collection.MapOps.default(Map.scala:274)
[info] at scala.collection.MapOps.default$(Map.scala:273)
[info] at scala.collection.AbstractMap.default(Map.scala:405)
[info] at scala.collection.MapOps.apply(Map.scala:176)
[info] at scala.collection.MapOps.apply$(Map.scala:175)
[info] at scala.collection.AbstractMap.apply(Map.scala:405)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info] at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
```
I manually printed the content of `spark.sparkContext.addedJars`, which is an empty `Map`.
However, when I execute
```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
All tests pass, and the content of `spark.sparkContext.addedJars` is
```
Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
```
The reason why this failure is not reproduced in the GitHub Action test is because `SQLQuerySuite` is indeed executed before `HiveDDLSuite`.
So in the current PR, I change to use `.get("default").foreach(_.remove(k))` that the remove operation is only performed when `.get("default")` is not `None`.
### Why are the changes needed?
Make `HiveDDLSuite` independently testable.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
- Pass Github Actions
- Manual check `HiveDDLSuite` with this pr and all test passed
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#44153 from LuciferYang/HiveDDLSuite.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?
When I test `HiveDDLSuite` with
```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
This test throws an error:
```
[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info] java.util.NoSuchElementException: key not found: default
[info] at scala.collection.MapOps.default(Map.scala:274)
[info] at scala.collection.MapOps.default$(Map.scala:273)
[info] at scala.collection.AbstractMap.default(Map.scala:405)
[info] at scala.collection.MapOps.apply(Map.scala:176)
[info] at scala.collection.MapOps.apply$(Map.scala:175)
[info] at scala.collection.AbstractMap.apply(Map.scala:405)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info] at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info] at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info] at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
```
I manually printed the content of `spark.sparkContext.addedJars`, which is an empty `Map`.
However, when I execute
```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
All tests pass, and the content of `spark.sparkContext.addedJars` is
```
Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
```
The reason why this failure is not reproduced in the GitHub Action test is because `SQLQuerySuite` is indeed executed before `HiveDDLSuite`.
So in the current PR, I change to use `.get("default").foreach(_.remove(k))` that the remove operation is only performed when `.get("default")` is not `None`.
### Why are the changes needed?
Make `HiveDDLSuite` independently testable.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
- Pass Github Actions
- Manual check `HiveDDLSuite` with this pr and all test passed
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#44153 from LuciferYang/HiveDDLSuite.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
When I test
HiveDDLSuitewithThis test throws an error:
I manually printed the content of
spark.sparkContext.addedJars, which is an emptyMap.However, when I execute
All tests pass, and the content of
spark.sparkContext.addedJarsisThe reason why this failure is not reproduced in the GitHub Action test is because
SQLQuerySuiteis indeed executed beforeHiveDDLSuite.So in the current PR, I change to use
.get("default").foreach(_.remove(k))that the remove operation is only performed when.get("default")is notNone.Why are the changes needed?
Make
HiveDDLSuiteindependently testable.Does this PR introduce any user-facing change?
No, just for test
How was this patch tested?
HiveDDLSuitewith this pr and all test passedWas this patch authored or co-authored using generative AI tooling?
No