Skip to content

Conversation

@liviazhu
Copy link
Contributor

@liviazhu liviazhu commented Sep 17, 2025

What changes were proposed in this pull request?

Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307 error.

Why are the changes needed?

Bug fix

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great in overall - I have slight concern with the config name but others are great. Thanks for fixing this!

@cloud-fan Would you mind taking a look?

.booleanConf
.createWithDefault(true)

val STREAMING_PROJECT_METADATA_COLS_ENABLED =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too general and could mislead the impact of the config. We'd need to mention DSv1 and getBatch (or microbatch plan for the source) in the config name to scope it correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we just remove the config. It's a bug fix and we get error anyway without this fix. It can't be worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. @liviazhu Let's remove this config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

)

checkAnswer(
newDF.where(s"$METADATA_FILE_SIZE > 0").select(METADATA_FILE_SIZE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.where(s"$METADATA_FILE_SIZE > 0")

I guess this is just a sanity check, right? Is this ever possible where a row is mapped to some file while the file has size 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just a sanity check

@HeartSaVioR HeartSaVioR changed the title [SPARK-53625] [SS] Propagate metadata columns through projections to address ApplyCharTypePadding incompatibility [SPARK-53625][SS] Propagate metadata columns through projections to address ApplyCharTypePadding incompatibility Sep 18, 2025
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The test failure is unrelated - it only failed with SDP test suite which is due to unavailability of 'yaml' library.

@HeartSaVioR
Copy link
Contributor

I discussed this with @cloud-fan yesterday and he was OK with the fix. I'm going to merge.

Thanks! Merging to master/4.0.

HeartSaVioR pushed a commit that referenced this pull request Sep 19, 2025
…ddress ApplyCharTypePadding incompatibility

### What changes were proposed in this pull request?

Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error.

### Why are the changes needed?

Bug fix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52375 from liviazhu/liviazhu-db/col-metadata.

Authored-by: Livia Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit a8bb8b0)
Signed-off-by: Jungtaek Lim <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…ddress ApplyCharTypePadding incompatibility

### What changes were proposed in this pull request?

Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error.

### Why are the changes needed?

Bug fix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52375 from liviazhu/liviazhu-db/col-metadata.

Authored-by: Livia Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit f8ee085)
Signed-off-by: Jungtaek Lim <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…ddress ApplyCharTypePadding incompatibility

### What changes were proposed in this pull request?

Modify streaming MicrobatchExecution to propagate metadata columns through projections to resolve an incompatibility with the ApplyCharTypePadding rule which is applied by default in serverless which previous resulted in an `assertion failed: Invalid batch: ACTV_IND#130290,_metadata#130291 != ACTV_IND#130307` error.

### Why are the changes needed?

Bug fix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52375 from liviazhu/liviazhu-db/col-metadata.

Authored-by: Livia Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants