Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Dec 4, 2023

What changes were proposed in this pull request?

When I test HiveDDLSuite with

build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive

This test throws an error:

[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info]   java.util.NoSuchElementException: key not found: default
[info]   at scala.collection.MapOps.default(Map.scala:274)
[info]   at scala.collection.MapOps.default$(Map.scala:273)
[info]   at scala.collection.AbstractMap.default(Map.scala:405)
[info]   at scala.collection.MapOps.apply(Map.scala:176)
[info]   at scala.collection.MapOps.apply$(Map.scala:175)
[info]   at scala.collection.AbstractMap.apply(Map.scala:405)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info]   at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)

I manually printed the content of spark.sparkContext.addedJars, which is an empty Map.

However, when I execute

build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive

All tests pass, and the content of spark.sparkContext.addedJars is

Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))

The reason why this failure is not reproduced in the GitHub Action test is because SQLQuerySuite is indeed executed before HiveDDLSuite.

So in the current PR, I change to use .get("default").foreach(_.remove(k)) that the remove operation is only performed when .get("default") is not None.

Why are the changes needed?

Make HiveDDLSuite independently testable.

Does this PR introduce any user-facing change?

No, just for test

How was this patch tested?

  • Pass Github Actions
  • Manual check HiveDDLSuite with this pr and all test passed

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft December 4, 2023 08:27
@github-actions github-actions bot added the SQL label Dec 4, 2023
val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString
spark.sparkContext.allAddedJars.keys.find(_.contains(jarName))
.foreach(spark.sparkContext.addedJars("default").remove)
.foreach(k => spark.sparkContext.addedJars.get("default") match {
Copy link
Contributor Author

@LuciferYang LuciferYang Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon

I noticed that this logic was added in #41495. If I'm not mistaken, this is a defensive deletion to avoid including TestUDTF.jar in default during testing? However, I found that running only this test:

build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive

throws an error:

[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info]   java.util.NoSuchElementException: key not found: default
[info]   at scala.collection.MapOps.default(Map.scala:274)
[info]   at scala.collection.MapOps.default$(Map.scala:273)
[info]   at scala.collection.AbstractMap.default(Map.scala:405)
[info]   at scala.collection.MapOps.apply(Map.scala:176)
[info]   at scala.collection.MapOps.apply$(Map.scala:175)
[info]   at scala.collection.AbstractMap.apply(Map.scala:405)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info]   at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)

I manually printed the contents of spark.sparkContext.addedJars and it's an empty Map.

But when I execute:

build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive

all tests pass, and the content of spark.sparkContext.addedJars is:

Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))

In GitHub Action tests, SQLQuerySuite does indeed execute before HiveDDLSuite, so the failure is not reproduced.

So in this PR, I added a case match to only execute remove when default is not None.

I didn't directly clear all the contents in spark.sparkContext.allAddedJars after SQLQuerySuite in this PR, because I'm not sure about the impact of this behavior.

Do you think this is ok? Or do you have any better suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems fine. Can we do as below though?

foreach(k => spark.sparkContext.addedJars.get("default").foreach(_.remove(k)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine to me ~

@LuciferYang LuciferYang changed the title [SPARK-46237][SQL][TESTS] Fix test failed of HiveDDLSuite [SPARK-46237][SQL][TESTS] Make HiveDDLSuite independently testable Dec 4, 2023
@LuciferYang LuciferYang marked this pull request as ready for review December 4, 2023 08:51
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master.

@LuciferYang
Copy link
Contributor Author

Thanks @dongjoon-hyun @HyukjinKwon

asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
### What changes were proposed in this pull request?
When I test `HiveDDLSuite` with

```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
This test throws an error:

```
[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info]   java.util.NoSuchElementException: key not found: default
[info]   at scala.collection.MapOps.default(Map.scala:274)
[info]   at scala.collection.MapOps.default$(Map.scala:273)
[info]   at scala.collection.AbstractMap.default(Map.scala:405)
[info]   at scala.collection.MapOps.apply(Map.scala:176)
[info]   at scala.collection.MapOps.apply$(Map.scala:175)
[info]   at scala.collection.AbstractMap.apply(Map.scala:405)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info]   at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
```

I manually printed the content of `spark.sparkContext.addedJars`, which is an empty `Map`.

However, when I execute

```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
All tests pass, and the content of `spark.sparkContext.addedJars` is

```
Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
```

The reason why this failure is not reproduced in the GitHub Action test is because `SQLQuerySuite` is indeed executed before `HiveDDLSuite`.

So in the current PR, I change to use `.get("default").foreach(_.remove(k))` that the remove operation is only performed when `.get("default")` is not `None`.

### Why are the changes needed?
Make `HiveDDLSuite` independently testable.

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
- Pass Github Actions
- Manual check `HiveDDLSuite` with this pr and all test passed

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44153 from LuciferYang/HiveDDLSuite.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
### What changes were proposed in this pull request?
When I test `HiveDDLSuite` with

```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
This test throws an error:

```
[info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
[info]   java.util.NoSuchElementException: key not found: default
[info]   at scala.collection.MapOps.default(Map.scala:274)
[info]   at scala.collection.MapOps.default$(Map.scala:273)
[info]   at scala.collection.AbstractMap.default(Map.scala:405)
[info]   at scala.collection.MapOps.apply(Map.scala:176)
[info]   at scala.collection.MapOps.apply$(Map.scala:175)
[info]   at scala.collection.AbstractMap.apply(Map.scala:405)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
[info]   at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
[info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
```

I manually printed the content of `spark.sparkContext.addedJars`, which is an empty `Map`.

However, when I execute

```
build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
```
All tests pass, and the content of `spark.sparkContext.addedJars` is

```
Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
```

The reason why this failure is not reproduced in the GitHub Action test is because `SQLQuerySuite` is indeed executed before `HiveDDLSuite`.

So in the current PR, I change to use `.get("default").foreach(_.remove(k))` that the remove operation is only performed when `.get("default")` is not `None`.

### Why are the changes needed?
Make `HiveDDLSuite` independently testable.

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
- Pass Github Actions
- Manual check `HiveDDLSuite` with this pr and all test passed

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44153 from LuciferYang/HiveDDLSuite.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants