Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prism] Fix top for unfused execution. Move to register. #27585

Merged
merged 2 commits into from
Jul 21, 2023

Conversation

lostluck
Copy link
Contributor

Some fixes for transforms/top WRT portable runners.

  • Enable top to run in non-fusing runners.
    • Top is very clever with it's accumulator encoding/decoding, doing it just in time.
    • This change fixes a bug where if no merges happen in the merge accumulators stage the type and encoder is unknown. This caused a failure if the runner doesn't fuse with the downstream extract output stage.
    • In particular, we now check if the data bundle exists when we have 0 elements, and if so, re-encode using the existing bundle.
  • Move to the generic Register package instead of the legacy code generator variant.

See #27550 and #24789.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the go label Jul 20, 2023
@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #27585 (bd183f8) into master (b2e00ef) will decrease coverage by 0.02%.
The diff coverage is 16.66%.

@@            Coverage Diff             @@
##           master   #27585      +/-   ##
==========================================
- Coverage   71.14%   71.12%   -0.02%     
==========================================
  Files         861      859       -2     
  Lines      104560   104495      -65     
==========================================
- Hits        74389    74324      -65     
+ Misses      28621    28613       -8     
- Partials     1550     1558       +8     
Flag Coverage Δ
go 53.57% <16.66%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/transforms/top/top.go 74.10% <16.66%> (-1.99%) ⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @riteshghorse for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@riteshghorse riteshghorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment on lines +161 to +163
if len(a.list) == 0 && len(a.data) > 0 {
values = a.data
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an understanding question if a.data could also be assigned to nil as we do for a.list after this if block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question.

My instinct is that since this function is supposed to write and encode data, it shouldn't be mutating the value at all.

Which means technically it shouldn't be nil'ing out that list field at all either.

In principle, the encoding will only happen when emitting the values downstream, which should only happen once per given value, at which point the value itself should be garbage collected away anyway.

So I'm going to do the opposite: remove the nil on encode line there.

@lostluck lostluck merged commit f0f6d3b into apache:master Jul 21, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants