-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go SDK]: Implement fileio.MatchContinuously transform #26188
[Go SDK]: Implement fileio.MatchContinuously transform #26188
Conversation
// - DuplicateAllow: allow emitting matches that have already been observed. Defaults to false | ||
// - DuplicateSkip: skip emitting matches that have already been observed. Defaults to true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to also enable emitting a file only if it has been modified since the last observation, but it requires introduction of a new method to the filesystems for retrieving a file's last modified timestamp first (#26187) + attaching this info to the fileio.FileMetadata
in fileio.matchFn
, so I will do it at a later stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review PR 26192 first then.
Codecov Report
@@ Coverage Diff @@
## master #26188 +/- ##
==========================================
- Coverage 81.06% 71.13% -9.93%
==========================================
Files 469 787 +318
Lines 67187 103365 +36178
==========================================
+ Hits 54463 73531 +19068
- Misses 12724 28352 +15628
- Partials 0 1482 +1482
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 330 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @riteshghorse for label go. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @riteshghorse @pabloem |
81bd33f
to
b81f261
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for your contribution!
R: @lostluck |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Ah foiled! We can fix the lack of tests a bit once state is added to Prism, since that's what's currently missing here. |
Happy to merge this in if there are no further question/disccusion (after one of us fixes the conflict in CHANGES.md of course!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Happy to merge this in, unless you want to have the files modified bit come in first/same time. Let me know, and I'll merge as appropriate.
I don't know if anything from the Java/Python equivalents require timers though, which are almost ready to go in.
b81f261
to
102ad04
Compare
Thank you both for reviewing! The Java and Python equivalents don't use timers. Happy too add tests for this one later on when state is supported. I can open a separate PR for the modified files part once #26192 has been merged so please go ahead and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I'm probably over-indexing on the importance of garbage collecting state. It's unlikely to be a problem for any but the most egregious number of files.
Fixes #26186 and adds a
fileio.MatchContinuously
transform to the Go SDK.I have tested the transform manually on Dataflow and Flink, which seem to be the only runners that currently support all of watermark estimation, self-checkpointing and state provisioning, that this transform relies on. I have not added any unit tests for the same reason, and there were limitations with running integration tests with some of these features for Dataflow and Flink so I have skipped that as well.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.