Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- Automatically generated by ExpressionsSchemaSuite -->
## Summary
- Number of queries: 338
- Number of queries: 339
- Number of expressions that missing example: 34
- Expressions missing examples: and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,struct,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch
## Schema of Built-in Functions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,23 +152,38 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {

val outputSize = outputs.size
val headerSize = header.size
val expectedOutputs: Seq[QueryOutput] = {
val (expectedMissingExamples, expectedOutputs) = {
val expectedGoldenOutput = fileToString(resultFile)
val lines = expectedGoldenOutput.split("\n")
val expectedSize = lines.size

assert(expectedSize == outputSize + headerSize,
s"Expected $expectedSize blocks in result file but got " +
s"${outputSize + headerSize}. Try regenerate the result files.")
s"${outputSize + headerSize}. Try regenerating the result files.")

Seq.tabulate(outputSize) { i =>
val numberOfQueries = lines(2).split(":")(1).trim.toInt
val expectedOutputs = Seq.tabulate(outputSize) { i =>
val segments = lines(i + headerSize).split('|')
QueryOutput(
className = segments(1).trim,
funcName = segments(2).trim,
sql = segments(3).trim,
schema = segments(4).trim)
}

assert(numberOfQueries == expectedOutputs.size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should put these assert on line 163

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189 assert expectedOutputs.size == outputSize achieves the same goal as numberOfQueries == outputSize. Is this acceptable?

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to mock a new function with comments and test this suite.

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to mock a new function with comments and test this suite.

You are right, Spark-24884 already trigger this problem cause by incomplete manual modification.

With the new function add scene, I think the right way is run this case with SPARK_GENERATE_GOLDEN_FILES = 1 to automatically regenerate the correct sql-expression-schema.md becasuse sql-expression-schema.md header said Automatically generated by ExpressionsSchemaSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, if the file is manually modified instead of automatically generated, I think the assertion failure caused by incorrect modification should be expected.

Do we need to tell users clearly with Try regenerating the result files with sys env SPARK_GENERATE_GOLDEN_FILES = 1?

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. Thanks! Could you put these assert on line 163, so that it looks clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address 6e6489b reorder the assertions.

s"expected outputs size: ${expectedOutputs.size} not same as numberOfQueries: " +
s"$numberOfQueries record in result file. Try regenerating the result files.")

val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
val expectedMissingExamples = lines(4).split(":")(1).trim.split(",")

assert(numberOfMissingExamples == expectedMissingExamples.size,
s"expected missing examples size: ${expectedMissingExamples.size} not same as " +
s"numberOfMissingExamples: $numberOfMissingExamples " +
"record in result file. Try regenerating the result files.")

(expectedMissingExamples, expectedOutputs)
}

// Compare results.
Expand All @@ -179,5 +194,13 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
assert(expected.sql == output.sql, "SQL query did not match")
assert(expected.schema == output.schema, s"Schema did not match for query ${expected.sql}")
}

// Compare expressions missing examples
assert(expectedMissingExamples.length == missingExamples.size,
"The number of missing examples not equals the number of expected missing examples.")

missingExamples.zip(expectedMissingExamples).foreach { case (output, expected) =>
assert(expected == output, "Missing example expression not match")
}
}
}