Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove avro dependency from runners-core-construction #30235

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

kennknowles
Copy link
Member

Splitting this independently useful change from #29924 because it also has code changes and is a little more than just a move.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@kennknowles
Copy link
Member Author

R: @chamikaramj since you have context on the series of adjustments

Copy link
Contributor

github-actions bot commented Feb 6, 2024

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@kennknowles
Copy link
Member Author

It looks like there is a problem with shading again :-(

It seems that the coder translator registered in the Avro extension is not considered the correct type. Probably because some classes it depends on are in a different, shaded, namespace.

@kennknowles
Copy link
Member Author

green!

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just had a few comments.

@@ -40,9 +40,6 @@ applyJavaNature(
include(project(path: it, configuration: "shadow"))
}
}
relocate "org.apache.beam.runners.core", getJavaRelocatedPath("runners.core")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explain a little bit this change? (what was the purposes of relocation here and why now it is no longer necessary, if possible)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a lot to put in a comment but let me know what you think should go in the code;

See the comment on line 22. It is complicated and actually wrong...

  • Java-to-proto translations can only be registered once or they will cause a failure.
  • The worry was: two runners could have registered different Java-to-proto translations and they would conflict, so the direct runner would relocate all of its registrations so the two runners could translate differently
    • it doesn't actually work this way (even with relocation you can't actually have two different translations really)
    • it would not have caused a conflict because the registrations in runners-core-construction would be the same AutoService class and not conflict
    • proto translation really should not exist differently per runner, but there are some hacks needed because of how we didn't put URNs in the core SDK, so runners have "translators" that are just there to give a URN to a non-serializable transform

Another annoying thing is that org.apache.beam.runners.core is execution-time utilities, but org.apache.beam.runners.core.construction was not. I think that partly this is accidentally catching it just because it has to be a package prefix. Honestly we could have just called it construction instead of core.construction and not had that problem.

Now the reason we must get rid of it: after relocation, the registered AvroCoderTranslatorRegistrar is a subclass or org.apache.beam.core.construction.CoderTranslatorRegistrar but it is not a subclass of org.apache.beam.runners.direct.relocated...org.apache.beam.core.construction.CoderTranslatorRegistrar.

A possible alternative might have been to also include the avro extension and relocation into the direct runner shadowClosure but I don't want to add a dependency on avro.

Plus, this is temporary because after this PR it can all collapse into the core SDK and there is no more core-construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really would love to not have any shadowClosure / relocation except for the vendored released artifacts. It builds faster and is way easier to debug and also intellij understands it better, etc.

@@ -132,9 +131,6 @@ public static class DirectTransformsRegistrar implements TransformPayloadTransla
.put(
DirectTestStream.class,
TransformPayloadTranslator.NotSerializable.forUrn(DIRECT_TEST_STREAM_URN))
.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, this is because we now have an AutoService in runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java, so no loner explicit register SplittableParDoViaKeyedWorkItems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Without the relocation from the other long comment, this one and the one in the flink runner conflict. But they register the same thing, and it doesn't do anything, and it is totally standard.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for details, good to learn.

@kennknowles kennknowles merged commit b5cfd95 into apache:master Feb 13, 2024
27 checks passed
@kennknowles kennknowles deleted the avro branch February 13, 2024 15:28
@Abacn Abacn mentioned this pull request Feb 27, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants