Skip to content

Conversation

@LucaCanali
Copy link
Contributor

Currently hbase-spark connector only works with Spark 2.x. Apache Spark 3.0 has been relesead in June 2020.
This addresses the changes needed to run the connector with Spark 3.0 and also to be able to compile the connector using Spark 3.0 as a dependency.
This has been manually tested with Apache Spark 3.0.1 and HBase 2.2.4.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 1m 24s master passed
-1 ❌ compile 0m 24s spark in master failed.
+1 💚 scaladoc 0m 17s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 4s Maven dependency ordering for patch
+1 💚 mvninstall 0m 44s the patch passed
-1 ❌ compile 0m 23s spark in the patch failed.
-1 ❌ scalac 0m 23s spark in the patch failed.
-1 ❌ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 scaladoc 0m 17s the patch passed
_ Other Tests _
-1 ❌ unit 0m 27s spark in the patch failed.
+1 💚 unit 3m 55s hbase-spark in the patch passed.
10m 40s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #75
Optional Tests dupname markdownlint scalac scaladoc unit compile
uname Linux 1e4cb11fc2f4 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/branch-compile-spark.txt
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/patch-compile-spark.txt
scalac https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/patch-compile-spark.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/whitespace-eol.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/artifact/yetus-precommit-check/output/patch-unit-spark.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/testReport/
Max. process+thread count 917 (vs. ulimit of 12500)
modules C: spark spark/hbase-spark U: spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 1m 16s master passed
-1 ❌ compile 0m 24s spark in master failed.
+1 💚 scaladoc 0m 16s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 3s Maven dependency ordering for patch
+1 💚 mvninstall 0m 43s the patch passed
-1 ❌ compile 0m 23s spark in the patch failed.
-1 ❌ scalac 0m 23s spark in the patch failed.
-1 ❌ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 scaladoc 0m 17s the patch passed
_ Other Tests _
-1 ❌ unit 0m 23s spark in the patch failed.
+1 💚 unit 3m 57s hbase-spark in the patch passed.
10m 1s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #75
Optional Tests dupname markdownlint scalac scaladoc unit compile
uname Linux f471a924f7d9 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/branch-compile-spark.txt
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/patch-compile-spark.txt
scalac https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/patch-compile-spark.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/whitespace-eol.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/artifact/yetus-precommit-check/output/patch-unit-spark.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/testReport/
Max. process+thread count 916 (vs. ulimit of 12500)
modules C: spark spark/hbase-spark U: spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/2/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Spark2 still works?

Thanks for working on this.

@saintstack
Copy link
Contributor

The failing seems to be because of this...

[WARNING] [Warn] : there were 18 deprecation warnings; re-run with -deprecation for details
[WARNING] one warning found
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:33: package org.apache.hadoop.hbase.spark.protobuf.generated does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:190: package SparkFilterProtos does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:192: package SparkFilterProtos does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:215: package SparkFilterProtos does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:249: package SparkFilterProtos.SQLPredicatePushDownFilter does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:250: package SparkFilterProtos does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:252: package SparkFilterProtos.SQLPredicatePushDownCellToColumnMapping does not exist
[ERROR] [Error] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java:253: package SparkFilterProtos does not exist
[INFO] [Info] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkLoadExample.java:-1: /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkLoadExample.java uses unchecked or unsafe operations.
[INFO] [Info] /home/jenkins/jenkins-home/workspace/HBase-Connectors-PreCommit_PR-75/yetus-precommit-check/src/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkLoadExample.java:-1: Recompile with -Xlint:unchecked for details.
[INFO] ------------------------------------------------------------------------

@LucaCanali
Copy link
Contributor Author

Thanks @saintstack for looking at this.
I am not sure I understand yet how the build errors relate to the patch. Could it be an issue with jenkins or am I missing something?

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for branch
+1 💚 mvninstall 1m 57s master passed
-1 ❌ compile 0m 30s spark in master failed.
+1 💚 scaladoc 0m 18s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 4s Maven dependency ordering for patch
+1 💚 mvninstall 0m 46s the patch passed
-1 ❌ compile 0m 24s spark in the patch failed.
-1 ❌ scalac 0m 24s spark in the patch failed.
-1 ❌ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 scaladoc 0m 18s the patch passed
_ Other Tests _
-1 ❌ unit 0m 24s spark in the patch failed.
+1 💚 unit 4m 12s hbase-spark in the patch passed.
12m 12s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #75
Optional Tests dupname markdownlint scalac scaladoc unit compile
uname Linux 460db9dff185 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/branch-compile-spark.txt
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/patch-compile-spark.txt
scalac https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/patch-compile-spark.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/whitespace-eol.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/artifact/yetus-precommit-check/output/patch-unit-spark.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/testReport/
Max. process+thread count 918 (vs. ulimit of 12500)
modules C: spark spark/hbase-spark U: spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/3/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
+1 💚 mvninstall 1m 28s master passed
-1 ❌ compile 0m 24s spark in master failed.
+1 💚 scaladoc 0m 17s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 3s Maven dependency ordering for patch
+1 💚 mvninstall 0m 44s the patch passed
-1 ❌ compile 0m 24s spark in the patch failed.
-1 ❌ scalac 0m 24s spark in the patch failed.
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 17s the patch passed
_ Other Tests _
-1 ❌ unit 0m 23s spark in the patch failed.
+1 💚 unit 3m 59s hbase-spark in the patch passed.
10m 40s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #75
Optional Tests dupname markdownlint scalac scaladoc unit compile
uname Linux a6c9b17e1f3d 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/artifact/yetus-precommit-check/output/branch-compile-spark.txt
compile https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/artifact/yetus-precommit-check/output/patch-compile-spark.txt
scalac https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/artifact/yetus-precommit-check/output/patch-compile-spark.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/artifact/yetus-precommit-check/output/patch-unit-spark.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/testReport/
Max. process+thread count 915 (vs. ulimit of 12500)
modules C: spark spark/hbase-spark U: spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/4/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 1m 36s master passed
+1 💚 compile 1m 44s master passed
+1 💚 scaladoc 0m 55s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 3s Maven dependency ordering for patch
+1 💚 mvninstall 0m 54s the patch passed
+1 💚 compile 1m 43s the patch passed
+1 💚 scalac 1m 43s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 58s the patch passed
_ Other Tests _
+1 💚 unit 7m 23s spark in the patch passed.
+1 💚 unit 7m 15s hbase-spark in the patch passed.
25m 23s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/5/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #75
Optional Tests dupname markdownlint scalac scaladoc unit compile
uname Linux 915a2c6a9153 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b8cb7fe
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/5/testReport/
Max. process+thread count 935 (vs. ulimit of 12500)
modules C: spark spark/hbase-spark U: spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-75/5/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@LucaCanali
Copy link
Contributor Author

Spark2 still works?

Yes, Spark 2 still works.

BTW, it is worth mentioning that there will have to be separate releases of the spark connector for each supported Scala version (notably for Scala 2.11 and 2.12) and that Spark 3.0 only supports Scala 2.12.

Copy link

@symat symat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch @LucaCanali !

I just found this patch after I got compilation errors when trying to build hbase-connectors with Spark 3.2 and Scala 2.12. Applying this patch, now the following command works for me:

mvn clean install -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.0 -Dspark.version=3.1.0 -Dscala.version=2.12.10 -Dscala.binary.version=2.12

also I verified the patch with spark 2, using:

mvn clean install -Dhadoop.profile=3.0 -Dhadoop-three.version=3.2.0 -Dspark.version=2.4.5 -Dscala.version=2.11.12 -Dscala.binary.version=2.11

Compilation and unit tests passed for both commands above. CI also seems to be green now, as far as I understood.

@saintstack what do you think?
@meszibalu can you take a look?

(I haven't actually tested the hbase connector with spark 3.1 just yet, but that will be my next step. Still, this patch seems to me a valuable contribution in its current form.)

@symat
Copy link

symat commented Feb 1, 2021

I just tested the hbase-connectors built with this patch on a cluster. I was able to use both Spark2 and Spark3 with the following setup:

  • I added the scala-library-2.11 jar file and the hbase-spark, hbase-spark-protocol files (built with -Dspark.version=2.4.5 -Dscala.version=2.11.12) to the class-path of the region servers
  • I was able to start and use a spark2 job writing and reading to HBase, using the hbase-spark and hbase-spark-protocol files built with -Dspark.version=2.4.5 -Dscala.version=2.11.12
  • I was able to start and use a spark3 job writing and reading to HBase, using the hbase-spark and hbase-spark-protocol files built with -Dspark.version=3.1.0 -Dscala.version=2.12.10

@joshelser
Copy link
Member

Seems fine to me, admittedly I'm not a spark/scala expert in any sense :)

Glancing at the Yetus personality, it doesn't look like we have anything to test both Spark2 and Spark3 during precommit. Should we also add that?

Copy link

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saintstack
Copy link
Contributor

LGTM. Leaving open while Josh's question is outstanding.

@joshelser
Copy link
Member

Glancing at the Yetus personality, it doesn't look like we have anything to test both Spark2 and Spark3 during precommit. Should we also add that?

Leaving open while Josh's question is outstanding

I'm OK to track this as a follow-on. I trust that Maté and the author have done adequate testing already.

@saintstack saintstack changed the title [HBASE-25326] Allow running and building hbase-connectors with Apache Spark 3.0 HBASE-25326 Allow running and building hbase-connectors with Apache Spark 3.0 Feb 1, 2021
@saintstack saintstack merged commit 4c46a24 into apache:master Feb 1, 2021
@saintstack
Copy link
Contributor

Merged it then @joshelser ... We can do build infra for s2 vs s3 in follow-on as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants