Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove more legacy Runner v1 cruft. #27512

Merged
merged 3 commits into from
Aug 9, 2023
Merged

Conversation

robertwb
Copy link
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@robertwb
Copy link
Contributor Author

R: @tvalentyn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

Merging #27512 (c65c703) into master (41e6628) will decrease coverage by 0.56%.
Report is 304 commits behind head on master.
The diff coverage is 76.00%.

@@            Coverage Diff             @@
##           master   #27512      +/-   ##
==========================================
- Coverage   71.16%   70.61%   -0.56%     
==========================================
  Files         861      860       -1     
  Lines      104547   103875     -672     
==========================================
- Hits        74401    73350    -1051     
- Misses      28597    28976     +379     
  Partials     1549     1549              
Flag Coverage Δ
python 79.61% <76.00%> (-0.76%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...on/apache_beam/runners/portability/flink_runner.py 59.61% <50.00%> (ø)
...on/apache_beam/runners/portability/spark_runner.py 67.34% <50.00%> (ø)
...apache_beam/runners/portability/portable_runner.py 74.74% <71.42%> (-1.11%) ⬇️
sdks/python/apache_beam/transforms/environments.py 87.70% <72.00%> (-0.67%) ⬇️
sdks/python/apache_beam/runners/runner.py 85.41% <93.75%> (+30.81%) ⬆️
sdks/python/apache_beam/portability/python_urns.py 100.00% <100.00%> (ø)
...ache_beam/runners/portability/expansion_service.py 91.83% <100.00%> (ø)

... and 49 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@robertwb robertwb force-pushed the no-v1-runner branch 4 times, most recently from 2587133 to d59749f Compare July 18, 2023 20:23
@robertwb
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting that some tests & lint failed on latest snapshot.

@Environment.register_urn(python_urns.EMBEDDED_PYTHON_LOOPBACK, None)
class PythonLoopbackEnvironment(EmbeddedPythonEnvironment):
"""Used as a stub when the loopback worker has not yet been started."""
def to_runner_api_parameter(self, context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the typehint? I think it might be: # type: (PipelineContext) -> typing.Tuple[str, message.Message]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

)

# TODO: https://github.com/apache/beam/issues/19168
# portable runner specific default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can and plan to make this a default for dataflow as well: #26996

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@tvalentyn tvalentyn Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow up, Dataflow runner no longer stages SDK from pypi and expects containers to have it.

if options.view_as(SetupOptions).sdk_location == 'default':
options.view_as(SetupOptions).sdk_location = 'container'

return self.run_full_pipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the semantic distinction between run_pipeline vs run_full_pipeline? It sounds like run_pipeline could run exectute subgraphs, but it calls into run_full_pipeline, which is supposed to run the entire graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly it's just a type distinction, but for backwards compatibility and the fact that only names (not type signatures) are used to resolve methods in Python I needed to call it something different. (IIRC, the old version could execute subgraphs at some point, I don't know if anyone uses that capability anymore.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, thanks. the only alternative that comes to mind is run_portable_pipeline(), but not sure if that would be a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for crashing, but I got tripped up by this already when writing code on top of these changes. I think any of run_portable_pipeline / run_pipeline_proto / run_pipeline_from_proto would be a bit clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @robertwb

Legacy runners can still override the Pipeline-object-taking run_pipeline
method, but this now has a default implementation.

As part of this it was necessary to refactor environments to make loopback
less of a special case.
@robertwb
Copy link
Contributor Author

robertwb commented Aug 4, 2023

Ping on this @tvalentyn

@tvalentyn
Copy link
Contributor

Run Python_Integration PreCommit 3.11

@robertwb robertwb merged commit 1755dd5 into apache:master Aug 9, 2023
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants