-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #128142 has finished for PR 29608 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable
| val outputSize = outputs.size | ||
| val headerSize = header.size | ||
| val expectedOutputs: Seq[QueryOutput] = { | ||
| val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but the types can just be added to the two tuple elements, instead of declaring them separately as a type for the whole tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First changed to val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput]), but I feel the type declaration here is redundant, so changed to val (expectedMissingExamples, expectedOutputs). Is this acceptable?
|
|
||
| Seq.tabulate(outputSize) { i => | ||
| val numberOfQueries = lines(2).split(":")(1).trim.toInt | ||
| val numberOfMissExample = lines(3).split(":")(1).trim.toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss -> missed or missing, in most cases in this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
| // Ensure consistency of the result file. | ||
| assert(numberOfQueries == expectedOutputs.size, | ||
| s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " + | ||
| "record in result file. Try regenerate the result files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regenerate -> regenerating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
|
Test build #128176 has finished for PR 29608 at commit
|
|
Test build #128178 has finished for PR 29608 at commit
|
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay cc: @cloud-fan @beliefer
|
|
||
| Seq.tabulate(outputSize) { i => | ||
| val numberOfQueries = lines(2).split(":")(1).trim.toInt | ||
| val numberOfMissingExample = lines(3).split(":")(1).trim.toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Example -> Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
|
The failed check cause by |
| } | ||
|
|
||
| // Ensure consistency of the result file. | ||
| assert(numberOfQueries == expectedOutputs.size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should put these assert on line 163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189 assert expectedOutputs.size == outputSize achieves the same goal as numberOfQueries == outputSize. Is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to mock a new function with comments and test this suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to mock a new function with comments and test this suite.
You are right, Spark-24884 already trigger this problem cause by incomplete manual modification.
With the new function add scene, I think the right way is run this case with SPARK_GENERATE_GOLDEN_FILES = 1 to automatically regenerate the correct sql-expression-schema.md becasuse sql-expression-schema.md header said Automatically generated by ExpressionsSchemaSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, if the file is manually modified instead of automatically generated, I think the assertion failure caused by incorrect modification should be expected.
Do we need to tell users clearly with Try regenerating the result files with sys env SPARK_GENERATE_GOLDEN_FILES = 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. Thanks! Could you put these assert on line 163, so that it looks clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address 6e6489b reorder the assertions.
| } | ||
|
|
||
| // Ensure consistency of the result file. | ||
| assert(numberOfQueries == expectedOutputs.size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.
| Seq.tabulate(outputSize) { i => | ||
| val numberOfQueries = lines(2).split(":")(1).trim.toInt | ||
| val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt | ||
| val missingExamples = lines(4).split(":")(1).trim.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectedMissingExamples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
| s"numberOfMissingExamples: $numberOfMissingExamples " + | ||
| "record in result file. Try regenerating the result files.") | ||
|
|
||
| (missingExamples, expectedOutputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
|
Good catch! Except for some minor issues. |
|
Test build #128221 has finished for PR 29608 at commit
|
|
Test build #128236 has finished for PR 29608 at commit
|
|
Test build #128222 has finished for PR 29608 at commit
|
|
LGTM! |
|
Test build #128249 has finished for PR 29608 at commit
|
|
@maropu |
All the tests in GitHub Actions passed, so this PR looks fine. |
|
Thx ~ @maropu |
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
sql-expression-schema.mdautomatically generated byExpressionsSchemaSuite, but only expressions entries are checked inExpressionsSchemaSuite. So if we manually modify the contents of the file,ExpressionsSchemaSuitedoes not necessarily guarantee the correctness of the it some times. For example, Spark-24884 addedregexp_extract_allexpression support, and manually modify thesql-expression-schema.mdbut not change the content ofNumber of queriescause file content inconsistency.Some additional checks have been added to
ExpressionsSchemaSuiteto improve the correctness guarantee ofsql-expression-schema.mdas follow:Number of queriesshould equals size ofexpressions entriesinsql-expression-schema.mdNumber of expressions that missing exampleshould equals size ofExpressions missing examplesinsql-expression-schema.mdMissExamplesfrom case should same asexpectedMissingExamplesfromsql-expression-schema.mdWhy are the changes needed?
Ensure the correctness of
sql-expression-schema.mdcontent.Does this PR introduce any user-facing change?
No
How was this patch tested?
Enhanced ExpressionsSchemaSuite