-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove more legacy Runner v1 cruft. #27512
Conversation
R: @tvalentyn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #27512 +/- ##
==========================================
- Coverage 71.16% 70.61% -0.56%
==========================================
Files 861 860 -1
Lines 104547 103875 -672
==========================================
- Hits 74401 73350 -1051
- Misses 28597 28976 +379
Partials 1549 1549
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 49 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
2587133
to
d59749f
Compare
Run Python_PVR_Flink PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noting that some tests & lint failed on latest snapshot.
@Environment.register_urn(python_urns.EMBEDDED_PYTHON_LOOPBACK, None) | ||
class PythonLoopbackEnvironment(EmbeddedPythonEnvironment): | ||
"""Used as a stub when the loopback worker has not yet been started.""" | ||
def to_runner_api_parameter(self, context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the typehint? I think it might be: # type: (PipelineContext) -> typing.Tuple[str, message.Message]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
) | ||
|
||
# TODO: https://github.com/apache/beam/issues/19168 | ||
# portable runner specific default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can and plan to make this a default for dataflow as well: #26996
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to follow up, Dataflow runner no longer stages SDK from pypi and expects containers to have it.
if options.view_as(SetupOptions).sdk_location == 'default': | ||
options.view_as(SetupOptions).sdk_location = 'container' | ||
|
||
return self.run_full_pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the semantic distinction between run_pipeline
vs run_full_pipeline
? It sounds like run_pipeline
could run exectute subgraphs, but it calls into run_full_pipeline
, which is supposed to run the entire graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly it's just a type distinction, but for backwards compatibility and the fact that only names (not type signatures) are used to resolve methods in Python I needed to call it something different. (IIRC, the old version could execute subgraphs at some point, I don't know if anyone uses that capability anymore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, thanks. the only alternative that comes to mind is run_portable_pipeline()
, but not sure if that would be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for crashing, but I got tripped up by this already when writing code on top of these changes. I think any of run_portable_pipeline
/ run_pipeline_proto
/ run_pipeline_from_proto
would be a bit clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @robertwb
Legacy runners can still override the Pipeline-object-taking run_pipeline method, but this now has a default implementation. As part of this it was necessary to refactor environments to make loopback less of a special case.
Ping on this @tvalentyn |
Run Python_Integration PreCommit 3.11 |
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.