Skip to content

Commit 0931295

Browse files
committed
Merge remote-tracking branch 'upstream/master' into scriptTransform
2 parents 2357d90 + cca79fa commit 0931295

File tree

29 files changed

+349
-341
lines changed

29 files changed

+349
-341
lines changed

dev/deps/spark-deps-hadoop-2.2

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ breeze_2.11-0.11.2.jar
1919
calcite-avatica-1.2.0-incubating.jar
2020
calcite-core-1.2.0-incubating.jar
2121
calcite-linq4j-1.2.0-incubating.jar
22-
chill-java-0.5.0.jar
23-
chill_2.11-0.5.0.jar
22+
chill-java-0.7.4.jar
23+
chill_2.11-0.7.4.jar
2424
commons-beanutils-1.7.0.jar
2525
commons-beanutils-core-1.8.0.jar
2626
commons-cli-1.2.jar

dev/deps/spark-deps-hadoop-2.3

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ breeze_2.11-0.11.2.jar
2121
calcite-avatica-1.2.0-incubating.jar
2222
calcite-core-1.2.0-incubating.jar
2323
calcite-linq4j-1.2.0-incubating.jar
24-
chill-java-0.5.0.jar
25-
chill_2.11-0.5.0.jar
24+
chill-java-0.7.4.jar
25+
chill_2.11-0.7.4.jar
2626
commons-beanutils-1.7.0.jar
2727
commons-beanutils-core-1.8.0.jar
2828
commons-cli-1.2.jar

dev/deps/spark-deps-hadoop-2.4

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ breeze_2.11-0.11.2.jar
2121
calcite-avatica-1.2.0-incubating.jar
2222
calcite-core-1.2.0-incubating.jar
2323
calcite-linq4j-1.2.0-incubating.jar
24-
chill-java-0.5.0.jar
25-
chill_2.11-0.5.0.jar
24+
chill-java-0.7.4.jar
25+
chill_2.11-0.7.4.jar
2626
commons-beanutils-1.7.0.jar
2727
commons-beanutils-core-1.8.0.jar
2828
commons-cli-1.2.jar

dev/deps/spark-deps-hadoop-2.6

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ breeze_2.11-0.11.2.jar
2525
calcite-avatica-1.2.0-incubating.jar
2626
calcite-core-1.2.0-incubating.jar
2727
calcite-linq4j-1.2.0-incubating.jar
28-
chill-java-0.5.0.jar
29-
chill_2.11-0.5.0.jar
28+
chill-java-0.7.4.jar
29+
chill_2.11-0.7.4.jar
3030
commons-beanutils-1.7.0.jar
3131
commons-beanutils-core-1.8.0.jar
3232
commons-cli-1.2.jar

dev/deps/spark-deps-hadoop-2.7

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ breeze_2.11-0.11.2.jar
2525
calcite-avatica-1.2.0-incubating.jar
2626
calcite-core-1.2.0-incubating.jar
2727
calcite-linq4j-1.2.0-incubating.jar
28-
chill-java-0.5.0.jar
29-
chill_2.11-0.5.0.jar
28+
chill-java-0.7.4.jar
29+
chill_2.11-0.7.4.jar
3030
commons-beanutils-1.7.0.jar
3131
commons-beanutils-core-1.8.0.jar
3232
commons-cli-1.2.jar

docs/configuration.md

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -929,30 +929,6 @@ Apart from these, the following properties are also available, and may be useful
929929
mapping has high overhead for blocks close to or below the page size of the operating system.
930930
</td>
931931
</tr>
932-
<tr>
933-
<td><code>spark.externalBlockStore.blockManager</code></td>
934-
<td>org.apache.spark.storage.TachyonBlockManager</td>
935-
<td>
936-
Implementation of external block manager (file system) that store RDDs. The file system's URL is set by
937-
<code>spark.externalBlockStore.url</code>.
938-
</td>
939-
</tr>
940-
<tr>
941-
<td><code>spark.externalBlockStore.baseDir</code></td>
942-
<td>System.getProperty("java.io.tmpdir")</td>
943-
<td>
944-
Directories of the external block store that store RDDs. The file system's URL is set by
945-
<code>spark.externalBlockStore.url</code> It can also be a comma-separated list of multiple
946-
directories on Tachyon file system.
947-
</td>
948-
</tr>
949-
<tr>
950-
<td><code>spark.externalBlockStore.url</code></td>
951-
<td>tachyon://localhost:19998 for Tachyon</td>
952-
<td>
953-
The URL of the underlying external blocker file system in the external block store.
954-
</td>
955-
</tr>
956932
</table>
957933

958934
#### Networking

docs/job-scheduling.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,7 @@ an application to gain back cores on one node when it has work to do. To use thi
5454

5555
Note that none of the modes currently provide memory sharing across applications. If you would like to share
5656
data this way, we recommend running a single server application that can serve multiple requests by querying
57-
the same RDDs. In future releases, in-memory storage systems such as [Tachyon](http://tachyon-project.org) will
58-
provide another approach to share RDDs.
57+
the same RDDs.
5958

6059
## Dynamic Resource Allocation
6160

docs/programming-guide.md

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1177,7 +1177,7 @@ that originally created it.
11771177

11781178
In addition, each persisted RDD can be stored using a different *storage level*, allowing you, for example,
11791179
to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
1180-
replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-project.org/).
1180+
replicate it across nodes.
11811181
These levels are set by passing a
11821182
`StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
11831183
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
@@ -1218,24 +1218,11 @@ storage levels is:
12181218
<td> MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. </td>
12191219
<td> Same as the levels above, but replicate each partition on two cluster nodes. </td>
12201220
</tr>
1221-
<tr>
1222-
<td> OFF_HEAP (experimental) </td>
1223-
<td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
1224-
Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
1225-
to be smaller and to share a pool of memory, making it attractive in environments with
1226-
large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
1227-
the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
1228-
in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
1229-
from memory. If you plan to use Tachyon as the off heap store, Spark is compatible with Tachyon
1230-
out-of-the-box. Please refer to this <a href="http://tachyon-project.org/master/Running-Spark-on-Tachyon.html">page</a>
1231-
for the suggested version pairings.
1232-
</td>
1233-
</tr>
12341221
</table>
12351222

12361223
**Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
12371224
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
1238-
`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.*
1225+
`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
12391226

12401227
Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
12411228

@@ -1259,11 +1246,6 @@ requests from a web application). *All* the storage levels provide full fault to
12591246
recomputing lost data, but the replicated ones let you continue running tasks on the RDD without
12601247
waiting to recompute a lost partition.
12611248

1262-
* In environments with high amounts of memory or multiple applications, the experimental `OFF_HEAP`
1263-
mode has several advantages:
1264-
* It allows multiple executors to share the same pool of memory in Tachyon.
1265-
* It significantly reduces garbage collection costs.
1266-
* Cached data is not lost if individual executors crash.
12671249

12681250
### Removing Data
12691251

make-distribution.sh

Lines changed: 1 addition & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,6 @@ set -x
3232
SPARK_HOME="$(cd "`dirname "$0"`"; pwd)"
3333
DISTDIR="$SPARK_HOME/dist"
3434

35-
SPARK_TACHYON=false
36-
TACHYON_VERSION="0.8.2"
37-
TACHYON_TGZ="tachyon-${TACHYON_VERSION}-bin.tar.gz"
38-
TACHYON_URL="http://tachyon-project.org/downloads/files/${TACHYON_VERSION}/${TACHYON_TGZ}"
39-
4035
MAKE_TGZ=false
4136
NAME=none
4237
MVN="$SPARK_HOME/build/mvn"
@@ -45,7 +40,7 @@ function exit_with_usage {
4540
echo "make-distribution.sh - tool for making binary distributions of Spark"
4641
echo ""
4742
echo "usage:"
48-
cl_options="[--name] [--tgz] [--mvn <mvn-command>] [--with-tachyon]"
43+
cl_options="[--name] [--tgz] [--mvn <mvn-command>]"
4944
echo "./make-distribution.sh $cl_options <maven build options>"
5045
echo "See Spark's \"Building Spark\" doc for correct Maven options."
5146
echo ""
@@ -69,9 +64,6 @@ while (( "$#" )); do
6964
echo "Error: '--with-hive' is no longer supported, use Maven options -Phive and -Phive-thriftserver"
7065
exit_with_usage
7166
;;
72-
--with-tachyon)
73-
SPARK_TACHYON=true
74-
;;
7567
--tgz)
7668
MAKE_TGZ=true
7769
;;
@@ -150,12 +142,6 @@ else
150142
echo "Making distribution for Spark $VERSION in $DISTDIR..."
151143
fi
152144

153-
if [ "$SPARK_TACHYON" == "true" ]; then
154-
echo "Tachyon Enabled"
155-
else
156-
echo "Tachyon Disabled"
157-
fi
158-
159145
# Build uber fat JAR
160146
cd "$SPARK_HOME"
161147

@@ -219,40 +205,6 @@ if [ -d "$SPARK_HOME"/R/lib/SparkR ]; then
219205
cp "$SPARK_HOME/R/lib/sparkr.zip" "$DISTDIR"/R/lib
220206
fi
221207

222-
# Download and copy in tachyon, if requested
223-
if [ "$SPARK_TACHYON" == "true" ]; then
224-
TMPD=`mktemp -d 2>/dev/null || mktemp -d -t 'disttmp'`
225-
226-
pushd "$TMPD" > /dev/null
227-
echo "Fetching tachyon tgz"
228-
229-
TACHYON_DL="${TACHYON_TGZ}.part"
230-
if [ $(command -v curl) ]; then
231-
curl --silent -k -L "${TACHYON_URL}" > "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
232-
elif [ $(command -v wget) ]; then
233-
wget --quiet "${TACHYON_URL}" -O "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
234-
else
235-
printf "You do not have curl or wget installed. please install Tachyon manually.\n"
236-
exit -1
237-
fi
238-
239-
tar xzf "${TACHYON_TGZ}"
240-
cp "tachyon-${TACHYON_VERSION}/assembly/target/tachyon-assemblies-${TACHYON_VERSION}-jar-with-dependencies.jar" "$DISTDIR/lib"
241-
mkdir -p "$DISTDIR/tachyon/src/main/java/tachyon/web"
242-
cp -r "tachyon-${TACHYON_VERSION}"/{bin,conf,libexec} "$DISTDIR/tachyon"
243-
cp -r "tachyon-${TACHYON_VERSION}"/servers/src/main/java/tachyon/web "$DISTDIR/tachyon/src/main/java/tachyon/web"
244-
245-
if [[ `uname -a` == Darwin* ]]; then
246-
# need to run sed differently on osx
247-
nl=$'\n'; sed -i "" -e "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\\$nl export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
248-
else
249-
sed -i "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\n export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
250-
fi
251-
252-
popd > /dev/null
253-
rm -rf "$TMPD"
254-
fi
255-
256208
if [ "$MAKE_TGZ" == "true" ]; then
257209
TARDIR_NAME=spark-$VERSION-bin-$NAME
258210
TARDIR="$SPARK_HOME/$TARDIR_NAME"

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@
147147
<jblas.version>1.2.4</jblas.version>
148148
<jetty.version>8.1.14.v20131031</jetty.version>
149149
<orbit.version>3.0.0.v201112011016</orbit.version>
150-
<chill.version>0.5.0</chill.version>
150+
<chill.version>0.7.4</chill.version>
151151
<ivy.version>2.4.0</ivy.version>
152152
<oro.version>2.0.8</oro.version>
153153
<codahale.metrics.version>3.1.2</codahale.metrics.version>

0 commit comments

Comments
 (0)