-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write #2903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2903 +/- ##
=============================================
- Coverage 47.62% 27.41% -20.22%
+ Complexity 5502 1285 -4217
=============================================
Files 930 381 -549
Lines 41268 15107 -26161
Branches 4137 1304 -2833
=============================================
- Hits 19655 4141 -15514
+ Misses 19865 10667 -9198
+ Partials 1748 299 -1449
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
7f8c4ef to
fa044aa
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the right behavior here should be. Should we error out or return an empty dataframe?
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
If someone tries to read a hudi table from a path that does not even exist, we get this exception as of now. So, I Guess the current fix should be fine. WDYT. |
297875e to
fcb7ab7
Compare
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
fcb7ab7 to
f474fba
Compare
|
my bad. I have no idea why I complicated it so much. Fixed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivabalan why are stopping the spark session here? is it not shared outside of a single test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we initialize at the beginning of each test. I know we have to fix this entire class for reuse in general. @hmit is working on this refactoring.
|
SQL DML extension tests are failing. I need time to check those out. Will update once I am able to make CI happy. |
f474fba to
0b48bd5
Compare
|
@nsivabalan moving this back to ready for review. Please update when the tests are fixed |
0b48bd5 to
712b446
Compare
712b446 to
2984599
Compare
|
This patch needs to be redone a bit. Since w/ sql dml, create relation will be called upfront, the empty table check has to be moved to sql dml layer. I will sync up with @pengzhiwei2018 on how to go about this. |
|
Guess we can remove the release blocker and critical label. Don't think this is very critical. I understand its nice to have. |
Why should we throw an exception for query empty table? I think return an empty list of rows is more reasonable. When user create table and query the table, it is not friendly to throws an exception. Other data format in spark, like parquet, delta, query empty table also return empty rows. |
|
@pengzhiwei2018 : yes, you are right. we should return empty rows. Would you mind taking it up since this involves changes in sql dml. Or I need to look into the code to see where to fit this in. I will reach out to you if I need any help. |
83ca285 to
fcafe23
Compare
|
@xushiyan : patch is good to review. |
|
@YannByron : Can you review the patch when you get a chance. should be small one. |
fcafe23 to
295deba
Compare
it's a very simple and forcible solution. There should have been |
5da3cac to
6fa140c
Compare
|
@YannByron : there are some failures in sql-dml related tests. TestMergeInfo etc. after we create the table, first merge into fails bcoz, with the proposed fix, we return an empty relation which return NIL schema. So, I guess we might need some fix on sql-dml classes. |
|
@nsivabalan All the failed DML is for an empty table but with failed write first? If the table is created by Spark-SQL, the right schema can be returned correctly. If not, throwing an exception is right and can't fix it. |
|
two TODO we need to consider:
|
|
@YannByron : yeah. feel free to put up a patch with all fixes required. once you have your patch, we can close this one. good to think about the unified schema. but wondering, for an empty table, does it really require to unify the schemas. why can't sql-dml layer intercept empty table and take action appropriately. |
|
@nsivabalan
�spark-sql also calls the The pr I submitted to your For table created by spark-sql, there is the right schema persisted in |
|
awesome, thanks a ton. I will work on it and update the patch. |
f0f4cba to
ab45189
Compare
|
@YannByron : can you review the patch. |
LGTM. |
What is the purpose of the pull request
*Fixed read of an empty table (failed write) with proper information.
Brief change log
Verify this pull request
Stacktrace before the fix:
find in the attached jira.
Result after the fix:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.