Skip to content
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4e96c01
Add YARN/Stable compiled classes to the CLASSPATH.
berngp Apr 15, 2014
1342886
The `spark-class` shell now ignores non jar files in the assembly dir…
berngp Apr 15, 2014
ddf2547
The `spark-shell` option `--log-conf` also enables the SPARK_PRINT_LA…
berngp Apr 15, 2014
2204539
Root is now Spark and qualify the assembly if it was built with YARN.
berngp Apr 15, 2014
889bf4e
Upgrade the Maven Build to YARN 2.3.0.
berngp Apr 16, 2014
460510a
merge https://github.com/berngp/spark/commits/feature/small-shell-cha…
witgo Apr 29, 2014
f1c7535
Improved build configuration Ⅱ
witgo Apr 29, 2014
8540e83
review commit
witgo Apr 30, 2014
c4c6e45
review commit
witgo Apr 30, 2014
9f08e80
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 1, 2014
e1a7e00
improve travis tests coverage
witgo May 1, 2014
effe79c
missing ","
witgo May 1, 2014
9ea1af9
add the dependency of commons-lang
witgo May 1, 2014
0ed124d
SPARK-1693: Most of the tests throw a java.lang.SecurityException whe…
witgo May 1, 2014
03b136f
revert .travis.yml
witgo May 1, 2014
d3488c6
Add the missing yarn dependencies
witgo May 1, 2014
779ae5d
Fix SPARK-1693: Dependent on multiple versions of servlet-api jars le…
witgo May 1, 2014
27bd426
review commit
witgo May 1, 2014
54a86b0
review commit
witgo May 2, 2014
882e35d
review commit
witgo May 2, 2014
31451df
Compile hive optional
witgo May 3, 2014
5fb961f
revert exclusion org.eclipse.jetty.orbit:javax.servlet
witgo May 3, 2014
ea53549
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 3, 2014
a5ff7d1
revert exclusion org.eclipse.jetty.orbit:javax.servlet
witgo May 3, 2014
17f6e7d
merge master
witgo May 4, 2014
3218d3b
merge master
witgo May 5, 2014
e788690
merge master
witgo May 7, 2014
8b0c63f
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 12, 2014
427d499
merge master
witgo May 12, 2014
f1eb268
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 12, 2014
4cc0c90
revert profile hive
witgo May 12, 2014
4277fed
review commit
witgo May 12, 2014
31c6409
review commit
witgo May 12, 2014
7d8cabf
Merge branch 'master' of https://github.com/apache/spark into improve…
witgo May 14, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ conf/spark-env.sh
conf/streaming-env.sh
conf/log4j.properties
conf/spark-defaults.conf
conf/*.xml
docs/_site
docs/api
target/
Expand Down
5 changes: 5 additions & 0 deletions .jvmopts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-Xmx3g
-Xss2M
-XX:+CMSClassUnloadingEnabled
-XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m
28 changes: 21 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,32 @@

language: scala
scala:
- "2.10.3"
- "2.10.4"
jdk:
- oraclejdk8
- oraclejdk7
env:
matrix:
- TEST="scalastyle assembly/assembly"
- TEST="catalyst/test sql/test streaming/test mllib/test graphx/test bagel/test"
- TEST=hive/test
- openjdk6
cache:
directories:
- $HOME/.m2
- $HOME/.ivy2
- $HOME/.sbt
env:
matrix:
- SPARK_HADOOP_VERSION=1.2.1
- SPARK_HADOOP_VERSION=2.4.0
before_script:
- sudo apt-get install libgfortran3
script:
- "sbt ++$TRAVIS_SCALA_VERSION $TEST"
# TODO: Cannot get hive/test to pass
>
if [[ $SPARK_HADOOP_VERSION = '2.4.0' ]]; then
mvn clean package -DskipTests -Pyarn -Dhadoop.version=$SPARK_HADOOP_VERSION -Dyarn.version=$SPARK_HADOOP_VERSION
num_jars=$(ls "$FWDIR"/assembly/target/scala-2.10/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)
if [ "$num_jars" -eq "0" ]; then
exit -1
fi
mvn test -Pyarn -Dhadoop.version=$SPARK_HADOOP_VERSION -Dyarn.version=$SPARK_HADOOP_VERSION -am -pl mllib -pl bagel -pl sql/catalyst -pl yarn
else
SPARK_HADOOP_VERSION=$SPARK_HADOOP_VERSION sbt ++$TRAVIS_SCALA_VERSION assembly/assembly sql/test graphx/test streaming/test
fi
1 change: 1 addition & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"

DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar`
CLASSPATH="$CLASSPATH:$DEPS_ASSEMBLY_JAR"
Expand Down
4 changes: 2 additions & 2 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@ export JAVA_OPTS

if [ ! -f "$FWDIR/RELEASE" ]; then
# Exit if the user hasn't compiled Spark
num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep -E "spark-assembly.*hadoop.*.jar$" | wc -l)
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep -E "spark-assembly.*hadoop.*.jar$")
if [ "$num_jars" -eq "0" ]; then
echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2
Expand Down
29 changes: 0 additions & 29 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -254,35 +254,6 @@
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>test</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<exportAntProperties>true</exportAntProperties>
<target>
<property name="spark.classpath" refid="maven.test.classpath" />
<property environment="env" />
<fail message="Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry.">
<condition>
<not>
<or>
<isset property="env.SCALA_HOME" />
<isset property="env.SCALA_LIBRARY_PATH" />
</or>
</not>
</condition>
</fail>
</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
Expand Down
9 changes: 6 additions & 3 deletions docs/building-with-maven.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,20 @@ For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions wit
For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you can enable the "yarn-alpha" or "yarn" profile and set the "hadoop.version", "yarn.version" property. Note that Hadoop 0.23.X requires a special `-Phadoop-0.23` profile:

# Apache Hadoop 2.0.5-alpha
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -Dyarn.version=2.0.5-alpha -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -DskipTests clean package

# Cloudera CDH 4.2.0 with MapReduce v2
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -Dyarn.version=2.0.0-cdh4.2.0 -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package

# Apache Hadoop 2.2.X (e.g. 2.2.0 as below) and newer
$ mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package
$ mvn -Pyarn -Dhadoop.version=2.2.0 -DskipTests clean package

# Apache Hadoop 0.23.x
$ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -Dyarn.version=0.23.7 -DskipTests clean package

# Different versions of HDFS and YARN.
$ mvn -Pyarn-alpha -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7 -DskipTests clean package

## Spark Tests in Maven ##

Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin). Some of the require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. You can then run the tests with `mvn -Dhadoop.version=... test`.
Expand Down
139 changes: 72 additions & 67 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache</groupId>
Expand Down Expand Up @@ -119,7 +120,7 @@
<log4j.version>1.2.17</log4j.version>
<hadoop.version>1.0.4</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<yarn.version>0.23.7</yarn.version>
<yarn.version>${hadoop.version}</yarn.version>
<hbase.version>0.94.6</hbase.version>
<hive.version>0.12.0</hive.version>
<parquet.version>1.3.2</parquet.version>
Expand All @@ -135,7 +136,8 @@

<repositories>
<repository>
<id>maven-repo</id> <!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution -->
<id>maven-repo</id>
<!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution -->
<name>Maven Repository</name>
<!-- HTTPS is unavailable for Maven Central -->
<url>http://repo.maven.apache.org/maven2</url>
Expand Down Expand Up @@ -558,64 +560,7 @@
<artifactId>jets3t</artifactId>
<version>0.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<!-- Matches the version of jackson-core-asl pulled in by avro -->
<groupId>org.codehaus.jackson</groupId>
Expand Down Expand Up @@ -737,6 +682,11 @@
<filereports>${project.build.directory}/SparkTestSuite.txt</filereports>
<argLine>-Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=512m</argLine>
<stderr/>
<environmentVariables>
<SPARK_HOME>${basedir}/..</SPARK_HOME>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is basedir here? Some of the sub projects are based in two-level directories, so I don't think the path from the sub-project pom to home is always the same.

<SPARK_TESTING>1</SPARK_TESTING>
<SPARK_CLASSPATH>${spark.classpath}</SPARK_CLASSPATH>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this breaks some of the tests I ran. Did you run the tests after re-factoring this build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to set this for the tests... what is the intention?

</environmentVariables>
</configuration>
<executions>
<execution>
Expand Down Expand Up @@ -850,12 +800,6 @@
<modules>
<module>yarn</module>
</modules>
<dependencies>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
</dependency>
</dependencies>
</profile>

<!-- Ganglia integration is not included by default due to LGPL-licensed code -->
Expand Down Expand Up @@ -895,13 +839,74 @@
<id>yarn</id>
<properties>
<hadoop.major.version>2</hadoop.major.version>
<hadoop.version>2.2.0</hadoop.version>
<hadoop.version>2.3.0</hadoop.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to change this so late in the game... could you revert this back to 2.2

<protobuf.version>2.5.0</protobuf.version>
</properties>
<modules>
<module>yarn</module>
</modules>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${yarn.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</dependencyManagement>
</profile>

<!-- Build without Hadoop dependencies that are included in some runtime environments. -->
Expand Down
2 changes: 1 addition & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ object SparkBuild extends Build {
val SCALAC_JVM_VERSION = "jvm-1.6"
val JAVAC_JVM_VERSION = "1.6"

lazy val root = Project("root", file("."), settings = rootSettings) aggregate(allProjects: _*)
lazy val root = Project("spark", file("."), settings = rootSettings) aggregate(allProjects: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - what is the benefit of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to increase readability.


lazy val core = Project("core", file("core"), settings = coreSettings)

Expand Down
29 changes: 0 additions & 29 deletions repl/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -92,35 +92,6 @@
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>test</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<exportAntProperties>true</exportAntProperties>
<target>
<property name="spark.classpath" refid="maven.test.classpath" />
<property environment="env" />
<fail message="Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry.">
<condition>
<not>
<or>
<isset property="env.SCALA_HOME" />
<isset property="env.SCALA_LIBRARY_PATH" />
</or>
</not>
</condition>
</fail>
</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
Expand Down
1 change: 0 additions & 1 deletion yarn/alpha/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
<groupId>org.apache.spark</groupId>
<artifactId>yarn-parent_2.10</artifactId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<groupId>org.apache.spark</groupId>
Expand Down
Loading