Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 14, 2020

What changes were proposed in this pull request?

  1. Move SHOW PARTITIONS parsing tests to ShowPartitionsParserSuite
  2. Place Hive tests for SHOW PARTITIONS from HiveCommandSuite to the base test suite v1.ShowPartitionsSuiteBase. This will allow to run the tests w/ and w/o Hive.

The changes follow the approach of #30287.

Why are the changes needed?

  • The unification will allow to run common SHOW PARTITIONS tests for both DSv1 and Hive DSv1, DSv2
  • We can detect missing features and differences between DSv1 and DSv2 implementations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running:

  • new test suites build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite"
  • and old one build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly org.apache.spark.sql.hive.execution.HiveCommandSuite"

@github-actions github-actions bot added the SQL label Nov 14, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 14, 2020

@cloud-fan @HyukjinKwon May I ask you to take a look at this PR, please.

@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35698/

@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35698/

}
}

test("show partitions of not partitioned table") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-partitioned" sounds a bit more natural.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will address your comment together with others. @HyukjinKwon @cloud-fan Do you have any comments for this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places with not partitioned:

$ find . -name '*.scala' -print0|xargs -0 grep -i -n 'not partitioned'
./core/src/test/scala/org/apache/spark/rdd/SortingSuite.scala:138:  test("get a range of elements in an array not partitioned by a range partitioner") {
./mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala:925:   * Note that the term "rating block" is a bit of a misnomer, as the ratings are not partitioned by
./streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala:134:          // If the RDD is not partitioned the right way, let us repartition it using the
./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:556:    // One side of join is not partitioned in the desired way. Need to shuffle one side.
./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:590:    // One side of join is not partitioned in the desired way. Since the number of partitions of
./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:591:    // the side that has already partitioned is smaller than the side that is not partitioned,
./sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala:308:  test("OverwritePartitions: overwrite all rows if not partitioned") {
./sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala:130:  test("show partitions of not partitioned table") {
./sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala:137:      assert(errMsg.contains("not allowed on a table that is not partitioned"))
./sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala:1982:        // not supported since the table is not partitioned
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala:73:  /** Schema of the partitioning columns, or the empty schema if the table is not partitioned. */
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:79:      rows += toCatalystRow("Not partitioned", "", "")
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala:533:            failAnalysis(s"Insert into a partition is not allowed because $l is not partitioned.")
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala:149:      // This dataset is not partitioned.
./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:462:        s"for tables that are not partitioned: $tableIdentWithDB")
./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:989:     * 1. If the table is not partitioned.
./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:999:        s"SHOW PARTITIONS is not allowed on a table that is not partitioned: $tableIdentWithDB")
./sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:1755:        // not supported since the table is not partitioned

I will change the title of this test but I am not sure about other places. I will leave them AS IS so far. @janekdb If you want, you can open a PR and fix them when it makes sense.

@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Test build #131095 has finished for PR 30377 at commit 8fe17a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.partitionBy("a")
.format("parquet")
.mode(SaveMode.Overwrite)
.saveAsTable("part_datasrc")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like testing the DataFrameWriter API not the SHOW PARTITIONS command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the test was already there. Let's keep it then.


override protected def createDateTable(table: String): Unit = {
sql(s"""
|CREATE TABLE $table (price int, qty int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE ... USING hive PARTITIONED BY (...) doesn't work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check that. I just didn't want to change the original test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the functions from Hive's suite.

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131148 has finished for PR 30377 at commit 47925f2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35755/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35755/

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131152 has finished for PR 30377 at commit 59f2b38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6883f29 Nov 16, 2020
@MaxGekk MaxGekk deleted the unify-dsv1_v2-show-partitions-tests branch February 19, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants