-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove avro dependency from runners-core-construction #30235
Conversation
R: @chamikaramj since you have context on the series of adjustments |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
It looks like there is a problem with shading again :-( It seems that the coder translator registered in the Avro extension is not considered the correct type. Probably because some classes it depends on are in a different, shaded, namespace. |
12ccbf8
to
415b138
Compare
green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just had a few comments.
...ction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
Show resolved
Hide resolved
@@ -40,9 +40,6 @@ applyJavaNature( | |||
include(project(path: it, configuration: "shadow")) | |||
} | |||
} | |||
relocate "org.apache.beam.runners.core", getJavaRelocatedPath("runners.core") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind explain a little bit this change? (what was the purposes of relocation here and why now it is no longer necessary, if possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a lot to put in a comment but let me know what you think should go in the code;
See the comment on line 22. It is complicated and actually wrong...
- Java-to-proto translations can only be registered once or they will cause a failure.
- The worry was: two runners could have registered different Java-to-proto translations and they would conflict, so the direct runner would relocate all of its registrations so the two runners could translate differently
- it doesn't actually work this way (even with relocation you can't actually have two different translations really)
- it would not have caused a conflict because the registrations in runners-core-construction would be the same AutoService class and not conflict
- proto translation really should not exist differently per runner, but there are some hacks needed because of how we didn't put URNs in the core SDK, so runners have "translators" that are just there to give a URN to a non-serializable transform
Another annoying thing is that org.apache.beam.runners.core
is execution-time utilities, but org.apache.beam.runners.core.construction
was not. I think that partly this is accidentally catching it just because it has to be a package prefix. Honestly we could have just called it construction
instead of core.construction
and not had that problem.
Now the reason we must get rid of it: after relocation, the registered AvroCoderTranslatorRegistrar
is a subclass or org.apache.beam.core.construction.CoderTranslatorRegistrar
but it is not a subclass of org.apache.beam.runners.direct.relocated...org.apache.beam.core.construction.CoderTranslatorRegistrar
.
A possible alternative might have been to also include the avro extension and relocation into the direct runner shadowClosure but I don't want to add a dependency on avro.
Plus, this is temporary because after this PR it can all collapse into the core SDK and there is no more core-construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I really would love to not have any shadowClosure / relocation except for the vendored released artifacts. It builds faster and is way easier to debug and also intellij understands it better, etc.
@@ -132,9 +131,6 @@ public static class DirectTransformsRegistrar implements TransformPayloadTransla | |||
.put( | |||
DirectTestStream.class, | |||
TransformPayloadTranslator.NotSerializable.forUrn(DIRECT_TEST_STREAM_URN)) | |||
.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, this is because we now have an AutoService in runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java, so no loner explicit register SplittableParDoViaKeyedWorkItems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Without the relocation from the other long comment, this one and the one in the flink runner conflict. But they register the same thing, and it doesn't do anything, and it is totally standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks for details, good to learn.
Splitting this independently useful change from #29924 because it also has code changes and is a little more than just a move.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.