-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,21 +30,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)" | |
| # Build up classpath | ||
| CLASSPATH="$SPARK_CLASSPATH:$FWDIR/conf" | ||
|
|
||
| # Support for interacting with Hive. Since hive pulls in a lot of dependencies that might break | ||
| # existing Spark applications, it is not included in the standard spark assembly. Instead, we only | ||
| # include it in the classpath if the user has explicitly requested it by running "sbt hive/assembly" | ||
| # Hopefully we will find a way to avoid uber-jars entirely and deploy only the needed packages in | ||
| # the future. | ||
| if [ -f "$FWDIR"/sql/hive/target/scala-$SCALA_VERSION/spark-hive-assembly-*.jar ]; then | ||
|
|
||
| # Datanucleus jars do not work if only included in the uberjar as plugin.xml metadata is lost. | ||
| DATANUCLEUSJARS=$(JARS=("$FWDIR/lib_managed/jars"/datanucleus-*.jar); IFS=:; echo "${JARS[*]}") | ||
| CLASSPATH=$CLASSPATH:$DATANUCLEUSJARS | ||
|
|
||
| ASSEMBLY_DIR="$FWDIR/sql/hive/target/scala-$SCALA_VERSION/" | ||
| else | ||
| ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION/" | ||
| fi | ||
| ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION" | ||
|
|
||
| # First check if we have a dependencies jar. If so, include binary classes with the deps jar | ||
| if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then | ||
|
|
@@ -59,7 +45,7 @@ if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then | |
| CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes" | ||
| CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes" | ||
|
|
||
| DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark*-assembly*hadoop*-deps.jar` | ||
| DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar` | ||
| CLASSPATH="$CLASSPATH:$DEPS_ASSEMBLY_JAR" | ||
| else | ||
| # Else use spark-assembly jar from either RELEASE or assembly directory | ||
|
|
@@ -71,6 +57,23 @@ else | |
| CLASSPATH="$CLASSPATH:$ASSEMBLY_JAR" | ||
| fi | ||
|
|
||
| # When Hive support is needed, Datanucleus jars must be included on the classpath. | ||
| # Datanucleus jars do not work if only included in the uberjar as plugin.xml metadata is lost. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. extra space "the uberjar" |
||
| # Both sbt and maven will populate "lib_managed/jars/" with the datanucleus jars when Spark is | ||
| # built with Hive, so first check if the datanucleus jars exist, and then ensure the current Spark | ||
| # assembly is built for Hive, before actually populating the CLASSPATH with the jars. | ||
| # Note that this check order is faster (by up to half a second) in the case where Hive is not used. | ||
| num_datanucleus_jars=$(ls "$FWDIR"/lib_managed/jars/ | grep "datanucleus-.*\\.jar" | wc -l) | ||
| if [ $num_datanucleus_jars -gt 0 ]; then | ||
| AN_ASSEMBLY_JAR=${ASSEMBLY_JAR:-$DEPS_ASSEMBLY_JAR} | ||
| num_hive_files=$(jar tvf "$AN_ASSEMBLY_JAR" org/apache/hadoop/hive/ql/exec 2>/dev/null | wc -l) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also an extra space after |
||
| if [ $num_hive_files -gt 0 ]; then | ||
| echo "Spark assembly has been built with Hive, including Datanucleus jars on classpath" 1>&2 | ||
| DATANUCLEUSJARS=$(echo "$FWDIR/lib_managed/jars"/datanucleus-*.jar | tr " " :) | ||
| CLASSPATH=$CLASSPATH:$DATANUCLEUSJARS | ||
| fi | ||
| fi | ||
|
|
||
| # Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1 | ||
| if [[ $SPARK_TESTING == 1 ]]; then | ||
| CLASSPATH="$CLASSPATH:$FWDIR/core/target/scala-$SCALA_VERSION/test-classes" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -154,5 +154,3 @@ if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then | |
| fi | ||
|
|
||
| exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -373,7 +373,6 @@ | |
| <groupId>org.apache.derby</groupId> | ||
| <artifactId>derby</artifactId> | ||
| <version>10.4.2.0</version> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hive requires derby in the compile scope via a transitive dependency on hive-metastore, and this setting was overriding that. This does not seem to be pulled in from non-hive assembly jars. |
||
| <dependency> | ||
| <groupId>net.liftweb</groupId> | ||
|
|
@@ -576,6 +575,12 @@ | |
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <!-- Matches the version of jackson-core-asl pulled in by avro --> | ||
| <groupId>org.codehaus.jackson</groupId> | ||
| <artifactId>jackson-mapper-asl</artifactId> | ||
| <version>1.8.8</version> | ||
| </dependency> | ||
| </dependencies> | ||
| </dependencyManagement> | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is valid? I seem to remember the maven and sbt builds name the assembly jar differently (?) If this is the right way to do it would it make sense to make the check in the
ifstatement consistent with this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maven doesn't build assemble-deps, as far as I know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. Maven doesn't build assemble-deps right now, so its fine to keep this sbt specific. It raises the question if/how we should add assemble-deps to the maven build, but thats a whole different issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may actually be not too hard to add assemble-deps to Maven if we have a build that inherits from "assembly" and simply excludes Spark's groupId. Though, packaging the Maven assembly is roughly 5x faster than SBT.