-
Notifications
You must be signed in to change notification settings - Fork 178
HBASE-25326 Allow running and building hbase-connectors with Apache Spark 3.0 #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
saintstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Spark2 still works?
Thanks for working on this.
|
The failing seems to be because of this... |
|
Thanks @saintstack for looking at this. |
76d7c48 to
31e8f9c
Compare
|
💔 -1 overall
This message was automatically generated. |
31e8f9c to
b1d2458
Compare
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Yes, Spark 2 still works. BTW, it is worth mentioning that there will have to be separate releases of the spark connector for each supported Scala version (notably for Scala 2.11 and 2.12) and that Spark 3.0 only supports Scala 2.12. |
symat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patch @LucaCanali !
I just found this patch after I got compilation errors when trying to build hbase-connectors with Spark 3.2 and Scala 2.12. Applying this patch, now the following command works for me:
mvn clean install -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.0 -Dspark.version=3.1.0 -Dscala.version=2.12.10 -Dscala.binary.version=2.12
also I verified the patch with spark 2, using:
mvn clean install -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.0 -Dspark.version=2.4.5 -Dscala.version=2.11.12 -Dscala.binary.version=2.11
Compilation and unit tests passed for both commands above. CI also seems to be green now, as far as I understood.
@saintstack what do you think?
@meszibalu can you take a look?
(I haven't actually tested the hbase connector with spark 3.1 just yet, but that will be my next step. Still, this patch seems to me a valuable contribution in its current form.)
|
I just tested the hbase-connectors built with this patch on a cluster. I was able to use both Spark2 and Spark3 with the following setup:
|
|
Seems fine to me, admittedly I'm not a spark/scala expert in any sense :) Glancing at the Yetus personality, it doesn't look like we have anything to test both Spark2 and Spark3 during precommit. Should we also add that? |
wchevreuil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
LGTM. Leaving open while Josh's question is outstanding. |
I'm OK to track this as a follow-on. I trust that Maté and the author have done adequate testing already. |
|
Merged it then @joshelser ... We can do build infra for s2 vs s3 in follow-on as you suggest. |
Currently hbase-spark connector only works with Spark 2.x. Apache Spark 3.0 has been relesead in June 2020.
This addresses the changes needed to run the connector with Spark 3.0 and also to be able to compile the connector using Spark 3.0 as a dependency.
This has been manually tested with Apache Spark 3.0.1 and HBase 2.2.4.