Skip to content

Commit c14f373

Browse files
committed
Merge pull request alteryx#241 from pwendell/master
Update broken links and add HDP 2.0 version string I ran a link checker on the UI and found several broken links. (cherry picked from commit 1f4a4bc) Signed-off-by: Patrick Wendell <[email protected]>
1 parent 473cba2 commit c14f373

File tree

6 files changed

+11
-10
lines changed

6 files changed

+11
-10
lines changed

docs/bagel-programming-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ _Example_
106106

107107
## Operations
108108

109-
Here are the actions and types in the Bagel API. See [Bagel.scala](https://github.com/apache/incubator-spark/blob/master/bagel/src/main/scala/spark/bagel/Bagel.scala) for details.
109+
Here are the actions and types in the Bagel API. See [Bagel.scala](https://github.com/apache/incubator-spark/blob/master/bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala) for details.
110110

111111
### Actions
112112

docs/hadoop-third-party-distributions.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ with these distributions:
1010
# Compile-time Hadoop Version
1111

1212
When compiling Spark, you'll need to
13-
[set the SPARK_HADOOP_VERSION flag](http://localhost:4000/index.html#a-note-about-hadoop-versions):
13+
[set the SPARK_HADOOP_VERSION flag](index.html#a-note-about-hadoop-versions):
1414

1515
SPARK_HADOOP_VERSION=1.0.4 sbt/sbt assembly
1616

@@ -40,6 +40,7 @@ the _exact_ Hadoop version you are running to avoid any compatibility errors.
4040
<tr><td>HDP 1.2</td><td>1.1.2</td></tr>
4141
<tr><td>HDP 1.1</td><td>1.0.3</td></tr>
4242
<tr><td>HDP 1.0</td><td>1.0.3</td></tr>
43+
<tr><td>HDP 2.0</td><td>2.2.0</td></tr>
4344
</table>
4445
</td>
4546
</tr>

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ By default, Spark links to Hadoop 1.0.4. You can change this by setting the
5858

5959
SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
6060

61-
In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
61+
In addition, if you wish to run Spark on [YARN](running-on-yarn.html), set
6262
`SPARK_YARN` to `true`:
6363

6464
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

docs/job-scheduling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ The fair scheduler also supports grouping jobs into _pools_, and setting differe
9191
(e.g. weight) for each pool. This can be useful to create a "high-priority" pool for more important jobs,
9292
for example, or to group the jobs of each user together and give _users_ equal shares regardless of how
9393
many concurrent jobs they have instead of giving _jobs_ equal shares. This approach is modeled after the
94-
[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/stable/fair_scheduler.html).
94+
[Hadoop Fair Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html).
9595

9696
Without any intervention, newly submitted jobs go into a _default pool_, but jobs' pools can be set by
9797
adding the `spark.scheduler.pool` "local property" to the SparkContext in the thread that's submitting them.

docs/running-on-yarn.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,12 +116,12 @@ For example:
116116

117117
Hadoop 2.2.x users must build Spark and publish it locally. The SBT build process handles Hadoop 2.2.x as a special case. This version of Hadoop has new YARN API changes and depends on a Protobuf version (2.5) that is not compatible with the Akka version (2.0.5) that Spark uses. Therefore, if the Hadoop version (e.g. set through ```SPARK_HADOOP_VERSION```) starts with 2.2.0 or higher then the build process will depend on Akka artifacts distributed by the Spark project compatible with Protobuf 2.5. Furthermore, the build process then uses the directory ```new-yarn``` (instead of ```yarn```), which supports the new YARN API. The build process should seamlessly work out of the box.
118118

119-
See [Building Spark with Maven](building-with-maven.md) for instructions on how to build Spark using the Maven process.
119+
See [Building Spark with Maven](building-with-maven.html) for instructions on how to build Spark using the Maven process.
120120

121121
# Important Notes
122122

123123
- We do not requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
124124
- The local directories used for spark will be the local directories configured for YARN (Hadoop Yarn config yarn.nodemanager.local-dirs). If the user specifies spark.local.dir, it will be ignored.
125125
- The --files and --archives options support specifying file names with the # similar to Hadoop. For example you can specify: --files localtest.txt#appSees.txt and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name appSees.txt and your application should use the name as appSees.txt to reference it when running on YARN.
126126
- The --addJars option allows the SparkContext.addJar function to work if you are using it with local files. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.
127-
- YARN 2.2.x users cannot simply depend on the Spark packages without building Spark, as the published Spark artifacts are compiled to work with the pre 2.2 API. Those users must build Spark and publish it locally.
127+
- YARN 2.2.x users cannot simply depend on the Spark packages without building Spark, as the published Spark artifacts are compiled to work with the pre 2.2 API. Those users must build Spark and publish it locally.

docs/streaming-programming-guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ ssc.stop()
214214
{% endhighlight %}
215215

216216
# Example
217-
A simple example to start off is the [NetworkWordCount](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/spark/streaming/examples/NetworkWordCount.scala). This example counts the words received from a network server every second. Given below is the relevant sections of the source code. You can find the full source code in `<Spark repo>/streaming/src/main/scala/spark/streaming/examples/NetworkWordCount.scala` .
217+
A simple example to start off is the [NetworkWordCount](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala). This example counts the words received from a network server every second. Given below is the relevant sections of the source code. You can find the full source code in `<Spark repo>/streaming/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala` .
218218

219219
{% highlight scala %}
220220
import org.apache.spark.streaming.{Seconds, StreamingContext}
@@ -283,7 +283,7 @@ Time: 1357008430000 ms
283283
</td>
284284
</table>
285285

286-
You can find more examples in `<Spark repo>/streaming/src/main/scala/spark/streaming/examples/`. They can be run in the similar manner using `./run-example org.apache.spark.streaming.examples....` . Executing without any parameter would give the required parameter list. Further explanation to run them can be found in comments in the files.
286+
You can find more examples in `<Spark repo>/streaming/src/main/scala/org/apache/spark/streaming/examples/`. They can be run in the similar manner using `./run-example org.apache.spark.streaming.examples....` . Executing without any parameter would give the required parameter list. Further explanation to run them can be found in comments in the files.
287287

288288
# DStream Persistence
289289
Similar to RDDs, DStreams also allow developers to persist the stream's data in memory. That is, using `persist()` method on a DStream would automatically persist every RDD of that DStream in memory. This is useful if the data in the DStream will be computed multiple times (e.g., multiple operations on the same data). For window-based operations like `reduceByWindow` and `reduceByKeyAndWindow` and state-based operations like `updateStateByKey`, this is implicitly true. Hence, DStreams generated by window-based operations are automatically persisted in memory, without the developer calling `persist()`.
@@ -483,7 +483,7 @@ Similar to [Spark's Java API](java-programming-guide.html), we also provide a Ja
483483
1. Functions for transformations must be implemented as subclasses of [Function](api/core/index.html#org.apache.spark.api.java.function.Function) and [Function2](api/core/index.html#org.apache.spark.api.java.function.Function2)
484484
1. Unlike the Scala API, the Java API handles DStreams for key-value pairs using a separate [JavaPairDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream) class(similar to [JavaRDD and JavaPairRDD](java-programming-guide.html#rdd-classes). DStream functions like `map` and `filter` are implemented separately by JavaDStreams and JavaPairDStream to return DStreams of appropriate types.
485485

486-
Spark's [Java Programming Guide](java-programming-guide.html) gives more ideas about using the Java API. To extends the ideas presented for the RDDs to DStreams, we present parts of the Java version of the same NetworkWordCount example presented above. The full source code is given at `<spark repo>/examples/src/main/java/spark/streaming/examples/JavaNetworkWordCount.java`
486+
Spark's [Java Programming Guide](java-programming-guide.html) gives more ideas about using the Java API. To extends the ideas presented for the RDDs to DStreams, we present parts of the Java version of the same NetworkWordCount example presented above. The full source code is given at `<spark repo>/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java`
487487

488488
The streaming context and the socket stream from input source is started by using a `JavaStreamingContext`, that has the same parameters and provides the same input streams as its Scala counterpart.
489489

@@ -527,5 +527,5 @@ JavaPairDStream<String, Integer> wordCounts = words.map(
527527
# Where to Go from Here
528528

529529
* API docs - [Scala](api/streaming/index.html#org.apache.spark.streaming.package) and [Java](api/streaming/index.html#org.apache.spark.streaming.api.java.package)
530-
* More examples - [Scala](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/spark/streaming/examples) and [Java](https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/spark/streaming/examples)
530+
* More examples - [Scala](https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples) and [Java](https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
531531
* [Paper describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)

0 commit comments

Comments
 (0)