-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53625][SS] Propagate metadata columns through projections to address ApplyCharTypePadding incompatibility #52375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great in overall - I have slight concern with the config name but others are great. Thanks for fixing this!
@cloud-fan Would you mind taking a look?
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val STREAMING_PROJECT_METADATA_COLS_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too general and could mislead the impact of the config. We'd need to mention DSv1 and getBatch (or microbatch plan for the source) in the config name to scope it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we just remove the config. It's a bug fix and we get error anyway without this fix. It can't be worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. @liviazhu Let's remove this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala
Outdated
Show resolved
Hide resolved
| ) | ||
|
|
||
| checkAnswer( | ||
| newDF.where(s"$METADATA_FILE_SIZE > 0").select(METADATA_FILE_SIZE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.where(s"$METADATA_FILE_SIZE > 0")
I guess this is just a sanity check, right? Is this ever possible where a row is mapped to some file while the file has size 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just a sanity check
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
The test failure is unrelated - it only failed with SDP test suite which is due to unavailability of 'yaml' library.
|
I discussed this with @cloud-fan yesterday and he was OK with the fix. I'm going to merge. Thanks! Merging to master/4.0. |
…ddress ApplyCharTypePadding incompatibility ### What changes were proposed in this pull request? Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #52375 from liviazhu/liviazhu-db/col-metadata. Authored-by: Livia Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit a8bb8b0) Signed-off-by: Jungtaek Lim <[email protected]>
…ddress ApplyCharTypePadding incompatibility ### What changes were proposed in this pull request? Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52375 from liviazhu/liviazhu-db/col-metadata. Authored-by: Livia Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit f8ee085) Signed-off-by: Jungtaek Lim <[email protected]>
…ddress ApplyCharTypePadding incompatibility ### What changes were proposed in this pull request? Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52375 from liviazhu/liviazhu-db/col-metadata. Authored-by: Livia Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
What changes were proposed in this pull request?
Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an
assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307error.Why are the changes needed?
Bug fix
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No