Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Dec 27, 2023

What changes were proposed in this pull request?

The pr aims to:

  • Clear unused error classes from error-classes.json.
  • Delete unused methods dataSourceAlreadyExists in QueryCompilationErrors.scala
  • Fix an outdated comment.

Why are the changes needed?

Make code clear.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.
  • Manually test.

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun
Copy link
Contributor Author

How to find unused error classes in Spark code repo
1.Generate the names of all error classes based on error classes. json.
2.Use a shell script to locate each of the above error classes in the Spark code repo and identify any unused error classes.
3.Manually check again.

During the search process above, some unused error classes were also discovered, as follows:

  • INTERNAL_ERROR_BROADCAST
  • INTERNAL_ERROR_MEMORY
  • INTERNAL_ERROR_STORAGE
  • INTERNAL_ERROR_SHUFFLE
  • INTERNAL_ERROR_EXECUTOR

These error classes did not appear in the Spark code repo, but they are temporarily retained as they are internal errors.

/**
* Object for grouping error messages from (most) exceptions thrown during query execution.
* This does not include exceptions thrown during the eager execution of commands, which are
* grouped into [[QueryCompilationErrors]].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sql/api module, QueryCompilationErrors has been renamed to CompilationErrors`, so we will fix it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have both, AFAIK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the perspective of module dependency, it seems to express the meaning of CompilationErrors in the sql/api module, instead of QueryCompilationErrors in the sql/catalyst module.

Or we can write it as: [[CompilationErrors]] and [[QueryCompilationErrors]] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's leave the ref to CompilationErrors only.

@panbingkun
Copy link
Contributor Author

cc @MaxGekk

@panbingkun panbingkun marked this pull request as ready for review December 27, 2023 03:31
@panbingkun panbingkun marked this pull request as draft December 27, 2023 05:40
@panbingkun
Copy link
Contributor Author

@MaxGekk
Additionally, I have found two error classes that only appear in UT. In my opinion, both of these error classes should be deleted. Is this appropriate?

1._LEGACY_ERROR_TEMP_3066

2._LEGACY_ERROR_TEMP_3078

(_: LogicalPlan) => throw new AnalysisException("_LEGACY_ERROR_TEMP_3078", Map.empty))

errorClass = "_LEGACY_ERROR_TEMP_3078", messageParameters = Map.empty)

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error classes did not appear in the Spark code repo, but they are temporarily retained as they are internal errors.

FYI, we have a conception of category for internal error, see https://github.com/apache/spark/pull/40978/files#diff-41229d3f8af21d2eb979e6d8c5b52058b1c460508f1786a6efa8dd6dbbc8c218R79. We form the names dynamically based on the caller side. This should allow to quickly identify from which sub-system the error comes from and who is responsible for it (to handle it).

@MaxGekk
Copy link
Member

MaxGekk commented Dec 27, 2023

both of these error classes should be deleted

If it is possible to delete them while preserving test logic, let's delete them.

case _ =>
throw new AnalysisException(
errorClass = "_LEGACY_ERROR_TEMP_3078", messageParameters = Map.empty)
case _ => assert(false, "Can not match ParquetTable in the query.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to this:

case _ => assert(false, "Can not match OrcTable in the query.")

throw new AnalysisException(
errorClass = "_LEGACY_ERROR_TEMP_3066",
messageParameters = Map("msg" -> ex.getMessage))
errorClass = "_LEGACY_ERROR_TEMP_2193",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to this:

case ex: MetaException =>
throw QueryExecutionErrors.getPartitionMetadataByFilterError(ex)

def getPartitionMetadataByFilterError(e: Exception): SparkRuntimeException = {
new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2193",
messageParameters = Map(
"hiveMetastorePartitionPruningFallbackOnException" ->
SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ON_EXCEPTION.key),
cause = e)
}

// Throw an AnalysisException - this should be captured.
spark.experimental.extraStrategies = Seq[SparkStrategy](
(_: LogicalPlan) => throw new AnalysisException("_LEGACY_ERROR_TEMP_3078", Map.empty))
(_: LogicalPlan) => throw new AnalysisException(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UT can actually handle any type of AnalysisException. I have chosen an error class (UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY) that is closer to semantics to replace it.

@panbingkun panbingkun marked this pull request as ready for review December 27, 2023 10:53
@panbingkun panbingkun requested a review from MaxGekk December 27, 2023 10:53
},
"sqlState" : "42K0B"
},
"INCORRECT_END_OFFSET" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reasons there is still a reference to the error class:

$ find . -type f  -print0|xargs -0 grep INCORRECT_END_OFFSET
./docs/sql-error-conditions-sqlstates.md:  <td><a href="arithmetic-overflow-error-class.md">ARITHMETIC_OVERFLOW</a>, <a href="sql-error-conditions.html#cast_overflow">CAST_OVERFLOW</a>, <a href="sql-error-conditions.html#cast_overflow_in_table_insert">CAST_OVERFLOW_IN_TABLE_INSERT</a>, <a href="sql-error-conditions.html#decimal_precision_exceeds_max_precision">DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION</a>, <a href="sql-error-conditions.html#invalid_index_of_zero">INVALID_INDEX_OF_ZERO</a>, <a href="sql-error-conditions.html#incorrect_end_offset">INCORRECT_END_OFFSET</a>, <a href="sql-error-conditions.html#incorrect_ramp_up_rate">INCORRECT_RAMP_UP_RATE</a>, <a href="invalid-array-index-error-class.md">INVALID_ARRAY_INDEX</a>, <a href="invalid-array-index-in-element-at-error-class.md">INVALID_ARRAY_INDEX_IN_ELEMENT_AT</a>, <a href="sql-error-conditions.html#numeric_out_of_supported_range">NUMERIC_OUT_OF_SUPPORTED_RANGE</a>, <a href="sql-error-conditions.html#numeric_value_out_of_range">NUMERIC_VALUE_OUT_OF_RANGE</a>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove the reference to the deleted error class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me delete it first, and then I'll try to investigate the root cause again to see if we can automate its discovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Object for grouping error messages from (most) exceptions thrown during query execution.
* This does not include exceptions thrown during the eager execution of commands, which are
* grouped into [[QueryCompilationErrors]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's leave the ref to CompilationErrors only.

},
"sqlState" : "42K0B"
},
"INCORRECT_END_OFFSET" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove the reference to the deleted error class.

@panbingkun panbingkun requested a review from MaxGekk December 28, 2023 01:20
@MaxGekk
Copy link
Member

MaxGekk commented Dec 28, 2023

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in bb497eb Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants