-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41206][SQL] Rename the error class _LEGACY_ERROR_TEMP_1233 to COLUMN_ALREADY_EXISTS
#38685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_LEGACY_ERROR_TEMP_1233 to COLUMN_ALREADY_EXISTS_LEGACY_ERROR_TEMP_1233 to COLUMN_ALREADY_EXISTS
_LEGACY_ERROR_TEMP_1233 to COLUMN_ALREADY_EXISTS_LEGACY_ERROR_TEMP_1233 to COLUMN_ALREADY_EXISTS
|
@srielau @LuciferYang @panbingkun @itholic @cloud-fan Could you review this PR, please. |
|
I've some doubts about COLUMN_ALREADY_EXISTS when a column is duplicated within a new list. I think we would like DUPLICATE_COLUMN or COLUMN_DUPLICATED, ... ? for these case and also include the table name. |
@srielau I assumed that we will provide a query context which should point out to the problematic part. |
Sure, but will every tool look at it? By that token we don't need most of the payload for various errors. Should this say: MAP_KEY_ALREADY_EXISTS I just checked CTE and table alias. Neither enforce unique names, which is curious. So I suppose the question boils down to CREATE TABLE and CREATE VIEW. |
Don't think this is a significant issue. A column might already exist in a constructor, partition spec itself.
Yep, we can say that key already exist in the provided map. If we introduce one more error class like I would just follow the existing convention, and name the error class as |
|
Merging to master. Thank you, @srielau and @cloud-fan for review. |
| checkError( | ||
| exception = e, | ||
| errorClass = "COLUMN_ALREADY_EXISTS", | ||
| parameters = Map("columnName" -> "`column1`")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several tests fixed here seem flaky because of the order in the map, e.g.:
- SPARK-8072: Better Exception for Duplicate Columns *** FAILED *** (42 milliseconds)
Map("columnName" -> "`column3`") did not equal Map("columnName" -> "`column1`") (SparkFunSuite.scala:317)
Analysis:
JavaCollectionWrappers$JMapWrapper(columnName: `column3` -> `column1`)
org.scalatest.exceptions.TestFailedException:
at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at org.apache.spark.SparkFunSuite.checkError(SparkFunSuite.scala:317)
at org.apache.spark.sql.DataFrameSuite.$anonfun$new$368(DataFrameSuite.scala:1781)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
https://github.com/apache/spark/actions/runs/3525051044/jobs/5911287739 and https://github.com/apache/spark/actions/runs/3526328003 which happens in a different JDK or Scala version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…o `COLUMN_ALREADY_EXISTS` ### What changes were proposed in this pull request? In the PR, I propose to assign the proper name `COLUMN_ALREADY_EXISTS ` to the legacy error class `_LEGACY_ERROR_TEMP_1233 `, and modify test suite to use `checkError()` which checks the error class name, context and etc. Also this PR improves the error message. ### Why are the changes needed? Proper name improves user experience w/ Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes, the PR changes an user-facing error message. ### How was this patch tested? By running the modified test suites: ``` $ $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt -Phive-2.3 "testOnly *HiveSQLInsertTestSuite" ``` Closes apache#38685 from MaxGekk/columns-already-exist. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…o `COLUMN_ALREADY_EXISTS` ### What changes were proposed in this pull request? In the PR, I propose to assign the proper name `COLUMN_ALREADY_EXISTS ` to the legacy error class `_LEGACY_ERROR_TEMP_1233 `, and modify test suite to use `checkError()` which checks the error class name, context and etc. Also this PR improves the error message. ### Why are the changes needed? Proper name improves user experience w/ Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes, the PR changes an user-facing error message. ### How was this patch tested? By running the modified test suites: ``` $ $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt -Phive-2.3 "testOnly *HiveSQLInsertTestSuite" ``` Closes apache#38685 from MaxGekk/columns-already-exist. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…o `COLUMN_ALREADY_EXISTS` ### What changes were proposed in this pull request? In the PR, I propose to assign the proper name `COLUMN_ALREADY_EXISTS ` to the legacy error class `_LEGACY_ERROR_TEMP_1233 `, and modify test suite to use `checkError()` which checks the error class name, context and etc. Also this PR improves the error message. ### Why are the changes needed? Proper name improves user experience w/ Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes, the PR changes an user-facing error message. ### How was this patch tested? By running the modified test suites: ``` $ $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt -Phive-2.3 "testOnly *HiveSQLInsertTestSuite" ``` Closes apache#38685 from MaxGekk/columns-already-exist. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to assign the proper name
COLUMN_ALREADY_EXISTSto the legacy error class_LEGACY_ERROR_TEMP_1233, and modify test suite to usecheckError()which checks the error class name, context and etc. Also this PR improves the error message.Why are the changes needed?
Proper name improves user experience w/ Spark SQL.
Does this PR introduce any user-facing change?
Yes, the PR changes an user-facing error message.
How was this patch tested?
By running the modified test suites: