Skip to content

Conversation

@Yaohua628
Copy link
Contributor

What changes were proposed in this pull request?

Cherry-picked from #37905

Streaming metrics report all 0 (processedRowsPerSecond, etc) when selecting _metadata column. Because the logical plan from the batch and the actual planned logical plan are mismatched. So, here we cannot find the plan and collect metrics correctly.

This PR fixes this by replacing the initial LogicalPlan with the LogicalPlan containing the metadata column

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing + New UTs

Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting `_metadata` column. Because the logical plan from the batch and the actual planned logical plan are mismatched. So, [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L348) we cannot find the plan and collect metrics correctly.

This PR fixes this by replacing the initial `LogicalPlan` with the `LogicalPlan` containing the metadata column

Bug fix.

No

Existing + New UTs

Closes apache#37905 from Yaohua628/spark-40460.

Authored-by: yaohua <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
@Yaohua628
Copy link
Contributor Author

Yaohua628 commented Sep 19, 2022

cc @HeartSaVioR
Resolved the conflict here: c58dcb0

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending build.

@HyukjinKwon
Copy link
Member

Merged to branch-3.3.

HyukjinKwon pushed a commit that referenced this pull request Sep 20, 2022
### What changes were proposed in this pull request?

Cherry-picked from #37905

Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting `_metadata` column. Because the logical plan from the batch and the actual planned logical plan are mismatched. So, [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L348) we cannot find the plan and collect metrics correctly.

This PR fixes this by replacing the initial `LogicalPlan` with the `LogicalPlan` containing the metadata column

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing + New UTs

Closes #37932 from Yaohua628/spark-40460-3-3.

Authored-by: yaohua <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants