Skip to content

Commit 76d4027

Browse files
committed
Merge pull request #8 from apache/master
update
2 parents 03b62b0 + d1966f3 commit 76d4027

File tree

676 files changed

+21390
-7600
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

676 files changed

+21390
-7600
lines changed

.gitignore

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
*~
2+
*.#*
3+
*#*#
24
*.swp
35
*.ipr
46
*.iml
57
*.iws
68
.idea/
9+
.idea_modules/
710
sbt/*.jar
811
.settings
912
.cache
@@ -15,11 +18,12 @@ out/
1518
third_party/libmesos.so
1619
third_party/libmesos.dylib
1720
conf/java-opts
18-
conf/spark-env.sh
19-
conf/streaming-env.sh
20-
conf/log4j.properties
21-
conf/spark-defaults.conf
22-
conf/hive-site.xml
21+
conf/*.sh
22+
conf/*.cmd
23+
conf/*.properties
24+
conf/*.conf
25+
conf/*.xml
26+
conf/slaves
2327
docs/_site
2428
docs/api
2529
target/
@@ -50,7 +54,6 @@ unit-tests.log
5054
/lib/
5155
rat-results.txt
5256
scalastyle.txt
53-
conf/*.conf
5457
scalastyle-output.xml
5558

5659
# For Hive

.rat-excludes

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ log4j.properties
1919
log4j.properties.template
2020
metrics.properties.template
2121
slaves
22+
slaves.template
2223
spark-env.sh
24+
spark-env.cmd
2325
spark-env.sh.template
2426
log4j-defaults.properties
2527
bootstrap-tooltip.js
@@ -58,3 +60,4 @@ dist/*
5860
.*iws
5961
logs
6062
.*scalastyle-output.xml
63+
.*dependency-reduced-pom.xml

CONTRIBUTING.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
## Contributing to Spark
2+
3+
Contributions via GitHub pull requests are gladly accepted from their original
4+
author. Along with any pull requests, please state that the contribution is
5+
your original work and that you license the work to the project under the
6+
project's open source license. Whether or not you state this explicitly, by
7+
submitting any copyrighted material via pull request, email, or other means
8+
you agree to license the material under the project's open source license and
9+
warrant that you have the legal authority to do so.
10+
11+
Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
12+
for more information.

README.md

Lines changed: 16 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,19 @@ and Spark Streaming for stream processing.
1313
## Online Documentation
1414

1515
You can find the latest Spark documentation, including a programming
16-
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
16+
guide, on the [project web page](http://spark.apache.org/documentation.html).
1717
This README file only contains basic setup instructions.
1818

1919
## Building Spark
2020

21-
Spark is built on Scala 2.10. To build Spark and its example programs, run:
21+
Spark is built using [Apache Maven](http://maven.apache.org/).
22+
To build Spark and its example programs, run:
2223

23-
./sbt/sbt assembly
24+
mvn -DskipTests clean package
2425

2526
(You do not need to do this if you downloaded a pre-built package.)
27+
More detailed documentation is available from the project site, at
28+
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
2629

2730
## Interactive Scala Shell
2831

@@ -71,73 +74,24 @@ can be run using:
7174

7275
./dev/run-tests
7376

77+
Please see the guidance on how to
78+
[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting).
79+
7480
## A Note About Hadoop Versions
7581

7682
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7783
storage systems. Because the protocols have changed in different versions of
7884
Hadoop, you must build Spark against the same version that your cluster runs.
79-
You can change the version by setting `-Dhadoop.version` when building Spark.
80-
81-
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
82-
versions without YARN, use:
83-
84-
# Apache Hadoop 1.2.1
85-
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
86-
87-
# Cloudera CDH 4.2.0 with MapReduce v1
88-
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
89-
90-
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
91-
with YARN, also set `-Pyarn`:
92-
93-
# Apache Hadoop 2.0.5-alpha
94-
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
95-
96-
# Cloudera CDH 4.2.0 with MapReduce v2
97-
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
98-
99-
# Apache Hadoop 2.2.X and newer
100-
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
101-
102-
When developing a Spark application, specify the Hadoop version by adding the
103-
"hadoop-client" artifact to your project's dependencies. For example, if you're
104-
using Hadoop 1.2.1 and build your application using SBT, add this entry to
105-
`libraryDependencies`:
106-
107-
"org.apache.hadoop" % "hadoop-client" % "1.2.1"
10885

109-
If your project is built with Maven, add this to your POM file's `<dependencies>` section:
110-
111-
<dependency>
112-
<groupId>org.apache.hadoop</groupId>
113-
<artifactId>hadoop-client</artifactId>
114-
<version>1.2.1</version>
115-
</dependency>
116-
117-
118-
## A Note About Thrift JDBC server and CLI for Spark SQL
119-
120-
Spark SQL supports Thrift JDBC server and CLI.
121-
See sql-programming-guide.md for more information about using the JDBC server and CLI.
122-
You can use those features by setting `-Phive` when building Spark as follows.
123-
124-
$ sbt/sbt -Phive assembly
86+
Please refer to the build documentation at
87+
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
88+
for detailed guidance on building for a particular distribution of Hadoop, including
89+
building for particular Hive and Hive Thriftserver distributions. See also
90+
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
91+
for guidance on building a Spark application that works with a particular
92+
distribution.
12593

12694
## Configuration
12795

12896
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
12997
in the online documentation for an overview on how to configure Spark.
130-
131-
132-
## Contributing to Spark
133-
134-
Contributions via GitHub pull requests are gladly accepted from their original
135-
author. Along with any pull requests, please state that the contribution is
136-
your original work and that you license the work to the project under the
137-
project's open source license. Whether or not you state this explicitly, by
138-
submitting any copyrighted material via pull request, email, or other means
139-
you agree to license the material under the project's open source license and
140-
warrant that you have the legal authority to do so.
141-
142-
Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
143-
for more information.

assembly/pom.xml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,9 @@
141141
<include>com.google.common.**</include>
142142
</includes>
143143
<excludes>
144-
<exclude>com.google.common.base.Optional**</exclude>
144+
<exclude>com/google/common/base/Absent*</exclude>
145+
<exclude>com/google/common/base/Optional*</exclude>
146+
<exclude>com/google/common/base/Present*</exclude>
145147
</excludes>
146148
</relocation>
147149
</relocations>
@@ -347,5 +349,15 @@
347349
</plugins>
348350
</build>
349351
</profile>
352+
<profile>
353+
<id>kinesis-asl</id>
354+
<dependencies>
355+
<dependency>
356+
<groupId>org.apache.httpcomponents</groupId>
357+
<artifactId>httpclient</artifactId>
358+
<version>${commons.httpclient.version}</version>
359+
</dependency>
360+
</dependencies>
361+
</profile>
350362
</profiles>
351363
</project>

bagel/src/test/resources/log4j.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ log4j.appender.file=org.apache.log4j.FileAppender
2121
log4j.appender.file.append=false
2222
log4j.appender.file.file=target/unit-tests.log
2323
log4j.appender.file.layout=org.apache.log4j.PatternLayout
24-
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %p %c{1}: %m%n
24+
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n
2525

2626
# Ignore messages below warning level from Jetty, because it's a bit verbose
2727
log4j.logger.org.eclipse.jetty=WARN

bin/compute-classpath.cmd

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,13 @@ rem Load environment variables from conf\spark-env.cmd, if it exists
3636
if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"
3737

3838
rem Build up classpath
39-
set CLASSPATH=%SPARK_CLASSPATH%;%SPARK_SUBMIT_CLASSPATH%;%FWDIR%conf
39+
set CLASSPATH=%SPARK_CLASSPATH%;%SPARK_SUBMIT_CLASSPATH%
40+
41+
if not "x%SPARK_CONF_DIR%"=="x" (
42+
set CLASSPATH=%CLASSPATH%;%SPARK_CONF_DIR%
43+
) else (
44+
set CLASSPATH=%CLASSPATH%;%FWDIR%conf
45+
)
4046

4147
if exist "%FWDIR%RELEASE" (
4248
for %%d in ("%FWDIR%lib\spark-assembly*.jar") do (

bin/compute-classpath.sh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,14 @@ FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
2727

2828
. "$FWDIR"/bin/load-spark-env.sh
2929

30+
CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH"
31+
3032
# Build up classpath
31-
CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:$FWDIR/conf"
33+
if [ -n "$SPARK_CONF_DIR" ]; then
34+
CLASSPATH="$CLASSPATH:$SPARK_CONF_DIR"
35+
else
36+
CLASSPATH="$CLASSPATH:$FWDIR/conf"
37+
fi
3238

3339
ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION"
3440

bin/pyspark

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,44 @@ fi
5050

5151
. "$FWDIR"/bin/load-spark-env.sh
5252

53-
# Figure out which Python executable to use
53+
# In Spark <= 1.1, setting IPYTHON=1 would cause the driver to be launched using the `ipython`
54+
# executable, while the worker would still be launched using PYSPARK_PYTHON.
55+
#
56+
# In Spark 1.2, we removed the documentation of the IPYTHON and IPYTHON_OPTS variables and added
57+
# PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS to allow IPython to be used for the driver.
58+
# Now, users can simply set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set
59+
# PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver
60+
# (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython
61+
# and executor Python executables.
62+
#
63+
# For backwards-compatibility, we retain the old IPYTHON and IPYTHON_OPTS variables.
64+
65+
# Determine the Python executable to use if PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON isn't set:
66+
if hash python2.7 2>/dev/null; then
67+
# Attempt to use Python 2.7, if installed:
68+
DEFAULT_PYTHON="python2.7"
69+
else
70+
DEFAULT_PYTHON="python"
71+
fi
72+
73+
# Determine the Python executable to use for the driver:
74+
if [[ -n "$IPYTHON_OPTS" || "$IPYTHON" == "1" ]]; then
75+
# If IPython options are specified, assume user wants to run IPython
76+
# (for backwards-compatibility)
77+
PYSPARK_DRIVER_PYTHON_OPTS="$PYSPARK_DRIVER_PYTHON_OPTS $IPYTHON_OPTS"
78+
PYSPARK_DRIVER_PYTHON="ipython"
79+
elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
80+
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
81+
fi
82+
83+
# Determine the Python executable to use for the executors:
5484
if [[ -z "$PYSPARK_PYTHON" ]]; then
55-
PYSPARK_PYTHON="python"
85+
if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then
86+
echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2
87+
exit 1
88+
else
89+
PYSPARK_PYTHON="$DEFAULT_PYTHON"
90+
fi
5691
fi
5792
export PYSPARK_PYTHON
5893

@@ -64,11 +99,6 @@ export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"
6499
export OLD_PYTHONSTARTUP="$PYTHONSTARTUP"
65100
export PYTHONSTARTUP="$FWDIR/python/pyspark/shell.py"
66101

67-
# If IPython options are specified, assume user wants to run IPython
68-
if [[ -n "$IPYTHON_OPTS" ]]; then
69-
IPYTHON=1
70-
fi
71-
72102
# Build up arguments list manually to preserve quotes and backslashes.
73103
# We export Spark submit arguments as an environment variable because shell.py must run as a
74104
# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks.
@@ -88,9 +118,9 @@ if [[ -n "$SPARK_TESTING" ]]; then
88118
unset YARN_CONF_DIR
89119
unset HADOOP_CONF_DIR
90120
if [[ -n "$PYSPARK_DOC_TEST" ]]; then
91-
exec "$PYSPARK_PYTHON" -m doctest $1
121+
exec "$PYSPARK_DRIVER_PYTHON" -m doctest $1
92122
else
93-
exec "$PYSPARK_PYTHON" $1
123+
exec "$PYSPARK_DRIVER_PYTHON" $1
94124
fi
95125
exit
96126
fi
@@ -106,10 +136,5 @@ if [[ "$1" =~ \.py$ ]]; then
106136
else
107137
# PySpark shell requires special handling downstream
108138
export PYSPARK_SHELL=1
109-
# Only use ipython if no command line arguments were provided [SPARK-1134]
110-
if [[ "$IPYTHON" = "1" ]]; then
111-
exec ${PYSPARK_PYTHON:-ipython} $IPYTHON_OPTS
112-
else
113-
exec "$PYSPARK_PYTHON"
114-
fi
139+
exec "$PYSPARK_DRIVER_PYTHON" $PYSPARK_DRIVER_PYTHON_OPTS
115140
fi

bin/pyspark2.cmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ for %%d in ("%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*
3333
)
3434
if [%FOUND_JAR%] == [0] (
3535
echo Failed to find Spark assembly JAR.
36-
echo You need to build Spark with sbt\sbt assembly before running this program.
36+
echo You need to build Spark before running this program.
3737
goto exit
3838
)
3939
:skip_build_test

0 commit comments

Comments
 (0)