-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/dead letter queue core #29164
Feature/dead letter queue core #29164
Conversation
# Conflicts: # CHANGES.md
Create gradle task and github actions config for GCS using this.
…tests Feature/automate performance tests
…tests fix call to gradle
…tests run on hosted runner for testing
…tests add additional checkout
…tests add destination for triggered tests
…tests move env variables to correct location
…tests try uploading against separate dataset
…tests try without a user
…tests update branch checkout, try to view the failure log
…tests run on failure
…tests update to use correct BigQuery instance
…tests convert to matrix
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/errorhandling/ErrorHandler.java
Show resolved
Hide resolved
Run Java PreCommit |
public abstract static class Record implements Serializable { | ||
|
||
/** The failing record, encoded as JSON. Will be null if serialization as JSON fails. */ | ||
public abstract @Nullable String getJsonRecord(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, thinking about this, we don't expect all records to be (faithfully) represented in json, but that's not the point. The point is to give the user some idea what the record is without having to scrutinize the bytes.
Should we lengthen this to humanReadableJsonRecord to better capture its intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, I'll make that change
throws Exception { | ||
if (exception != null) { | ||
throw exception; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we should be throwing a generic exception if one is not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something more directly like "Encountered bad record." Might be good to put (some of?) the record itself in the message as it won't be stored elsewhere.
Preconditions.checkArgumentNotNull(record); | ||
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter(); | ||
|
||
// Build up record information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this logic would be better placed as factory methods on BadRecord itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the whole block down to line 92.
|
||
@Override | ||
public void close() { | ||
closed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be an error (or no-op?) to close this twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely yes
closed = true; | ||
PCollection<ErrorT> flattened; | ||
if (errorCollections.isEmpty()) { | ||
LOG.warn("Empty list of error pcollections passed to ErrorHandler."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warn is probably too strong, and likely not very actionable.
* configured an error handler or not. | ||
*/ | ||
@Internal | ||
class NoOpErrorHandler<ErrorT, OutputT extends POutput> implements ErrorHandler<ErrorT, OutputT> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoOp seems to imply that we do nothing with the errors (like /dev/null). What we want here is something that we can call addErrorCollection on when we know that the passed error collection will be empty, but should never be used in as an actual error handler. In particular, if there happens to be something in those collections, that's a bug. (Should this be asserted? Actually, in this case it's fine to call close() and all. It's a NoneShallPassErrorHandler.)
I just can't shake the feeling that this'd all be so much easier if we finish pushing error handling into ParDo itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename this. I agree that I'll probably want to rework this once we update the ParDo itself, but that will take quite a lot of design that we don't have in place yet.
BigEndianIntegerCoder.of(), | ||
new RuntimeException(), | ||
"Integer was odd", | ||
"NoOpDoFn"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd have to extract this from the DoFn Context object. (Or, more hacky, leverage the same globals that Counters and logging do.)
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/errorhandling/ErrorHandlerTest.java
Show resolved
Hide resolved
This should be good to go. The pulsar tests are know to be quite flaky |
…er-queue-core # Conflicts: # CHANGES.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good. Just some minor comments.
throws Exception { | ||
if (exception != null) { | ||
throw exception; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something more directly like "Encountered bad record." Might be good to put (some of?) the record itself in the message as it won't be stored elsewhere.
Preconditions.checkArgumentNotNull(record); | ||
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter(); | ||
|
||
// Build up record information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the whole block down to line 92.
public void testErrorHandlerWithBRHTransform() throws Exception { | ||
PCollection<Integer> record = pipeline.apply(Create.of(1, 2, 3, 4)); | ||
DummySinkTransform<BadRecord> transform = new DummySinkTransform<>(); | ||
ErrorHandler<BadRecord, PCollection<BadRecord>> eh = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use a try/autoclosable context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lets us manually get access to the output collection, which is useful in the test context
Add the base utilities for creating error handlers and adding DLQs to Java IOs
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.