Skip to content

Conversation

@jonvex
Copy link
Contributor

@jonvex jonvex commented Feb 28, 2024

Change Logs

All issues related to schema should throw exceptions that are HoodieSchemaExceptions. Classify exceptions when converting from avro to spark row format as schema compatibility exceptions because they are due to illegal schema, or the records are incompatible with the provided schema.

Additionally, introduce a new exception type HoodieRecordCreationException which is thrown when turning an engine specific record into a HoodieRecord

Impact

Exceptions can be classified easier

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Feb 28, 2024
@jonvex jonvex changed the title make exceptions more specific Classify schema exceptions when converting from avro to spark row representation Mar 6, 2024
@jonvex jonvex changed the title Classify schema exceptions when converting from avro to spark row representation [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation Mar 6, 2024
@jonvex jonvex marked this pull request as ready for review March 6, 2024 22:09
@danny0405 danny0405 self-assigned this Mar 8, 2024
@danny0405 danny0405 added the area:sql SQL interfaces label Mar 8, 2024
@apache apache deleted a comment from hudi-bot Mar 11, 2024
@jonvex jonvex requested a review from yihua March 13, 2024 18:48
@jonvex jonvex requested a review from yihua March 27, 2024 16:18

def maybeWrapDataFrameWithException(df: DataFrame, exceptionClass: String, msg: String, shouldWrap: Boolean): DataFrame = {
if (shouldWrap) {
HoodieUnsafeUtils.createDataFrameFromRDD(df.sparkSession, injectSQLConf(df.queryExecution.toRdd.mapPartitions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SQLConf injection necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6352 alexey says that "Yes, it will propagate to all RDDs in the execution chain (up to a shuffling point)". So I guess the question is if there is a shuffling point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My doubt is if wrapping the df with exception needs the SQL conf injection. So you're saying this needs SQL conf injection so the configs can still be propagated to the executors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep an eye on the DAG and Spark execution when this is rolled out. I think this is OK for now.

@jonvex jonvex requested a review from yihua March 28, 2024 16:56

def maybeWrapDataFrameWithException(df: DataFrame, exceptionClass: String, msg: String, shouldWrap: Boolean): DataFrame = {
if (shouldWrap) {
HoodieUnsafeUtils.createDataFrameFromRDD(df.sparkSession, injectSQLConf(df.queryExecution.toRdd.mapPartitions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep an eye on the DAG and Spark execution when this is rolled out. I think this is OK for now.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block merging for a couple of comments on docs to address

@github-actions github-actions bot removed the size:M PR with lines of changes in (100, 300] label Apr 1, 2024
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Apr 1, 2024
@jonvex jonvex requested a review from yihua April 1, 2024 14:07
@github-actions github-actions bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Apr 1, 2024
package org.apache.hudi.util

import org.apache.hudi.common.util.ReflectionUtils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs here?

@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Apr 3, 2024
@hudi-bot
Copy link
Collaborator

hudi-bot commented Apr 3, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonvex jonvex merged commit bf723f5 into apache:master Apr 3, 2024
yihua added a commit that referenced this pull request May 14, 2024
…park row representation (#10778)

* make exceptions more specific

* use hudi avro exception

* Address review comments

* fix unnecessary changes

* add exception wrapping

* style

* address review comments

* remove . from config

* address review comments

* fix merge

* fix checkstyle

* Update hudi-common/src/main/java/org/apache/hudi/exception/HoodieRecordCreationException.java

Co-authored-by: Y Ethan Guo <[email protected]>

* Update hudi-common/src/main/java/org/apache/hudi/exception/HoodieAvroSchemaException.java

Co-authored-by: Y Ethan Guo <[email protected]>

* add javadoc to exception wrapper

---------

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: Y Ethan Guo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces release-0.15.0 size:L PR with lines of changes in (300, 1000]

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants