-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[1568] Fixing spark3 bundles #2625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2625 +/- ##
============================================
+ Coverage 50.94% 51.76% +0.81%
- Complexity 3169 3596 +427
============================================
Files 433 475 +42
Lines 19814 22553 +2739
Branches 2034 2406 +372
============================================
+ Hits 10095 11674 +1579
- Misses 8900 9861 +961
- Partials 819 1018 +199
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
@nsivabalan thanks for this. I think generating thrice is okay. Don't see any other way, to avoid this. Whenever we can, we should avoid any breaking changes . So can we not change name of the existing bundle name for spark2. i.e Having sparkbundle.version is okay. can we set this through the spark3 maven profile though? That way if you run a run for spark3, the bundle is called
So this is what users who write spark jobs depend on today. We should keep this as-is? They will ultimately compile their application against the right spark version anyway. |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What is the purpose of the pull request
Looks like we don't generate spark3 bundles at all(all artifacts in maven is for spark2). this patch is looking to fix it.
As of now, we are required to pass command line args while packaging as below.
but the issue with this approach is that, spark bundle that gets created for both spark3_2.12 and spark2_2.12 has the same name. so, we can't differentiate in maven.
In this patch, modifying the bundle names depending on spark version.
respectively for spark2 and spark3.
So, during release process, RM might have to deploy artifacts thrice.
Running the deploy thrice should be fine. Bcoz, even today we run it twice once for scala 11 and once for scala 12. so, common artifacts (which does not depend on scala version) will get overridden 2nd time, but its same as previous run as well. Only those artifacts which has _12 suffix will be added as new artifacts when deploy is called 2nd time.
To be discussed:
sparkbundle.versionto inject "2" or "3" with bundle name. If we were to resort to spark.version, artifact generated will be as follows. Not sure if we really need to reflect the actual spark version in artifact name.hudi-spark_2.12-0.7.0.jarsince we also uploadhudi-spark2_2.12-0.7.0.jarandhudi-spark3_2.12-0.7.0-jar.Basically we have all these for 0.7.0 in maven right now.
hudi-spark_2.11-0.7.0.jar
hudi-spark_2.12-0.7.0.jar
hudi-spark2_2.11-0.7.0.jar
hudi-spark2_2.12-0.7.0.jar
hudi-spark3_2.12-0.7.0.jar
Proposal is to avoid uploading first 2 in the above list.
Brief change log
Verify this pull request
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.