Skip to content

[SPARK-28609][DOC] Fix broken styles/links and make up-to-date#25345

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-28609
Closed

[SPARK-28609][DOC] Fix broken styles/links and make up-to-date#25345
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-28609

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 3, 2019

What changes were proposed in this pull request?

This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release.

  • building-spark.md
    Screen Shot 2019-08-02 at 10 33 51 PM

  • configuration.md
    Screen Shot 2019-08-02 at 10 34 52 PM

  • sql-pyspark-pandas-with-arrow.md
    Screen Shot 2019-08-02 at 10 36 14 PM

  • streaming-programming-guide.md
    Screen Shot 2019-08-02 at 10 37 11 PM

  • structured-streaming-programming-guide.md (1/2)
    Screen Shot 2019-08-02 at 10 38 20 PM

  • structured-streaming-programming-guide.md (2/2)
    Screen Shot 2019-08-02 at 10 40 05 PM

  • submitting-applications.md
    Screen Shot 2019-08-02 at 10 41 13 PM

How was this patch tested?

Manual. Build the doc.

SKIP_API=1 jekyll build

@dongjoon-hyun dongjoon-hyun changed the title Init [SPARK-28609][DOC] Fix broken styles/links and make up-to-date Aug 3, 2019
Useful for allowing Spark to resolve artifacts from behind a firewall e.g. via an in-house
artifact server like Artifactory. Details on the settings file format can be
found at http://ant.apache.org/ivy/history/latest-milestone/settings.html
found at <a href="http://ant.apache.org/ivy/history/latest-milestone/settings.html">Settings Files</a>
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hyperlink is better than the text. Note that Settings Files is the title of that page.

Internally, by default, Structured Streaming queries are processed using a *micro-batch processing* engine, which processes data streams as a series of small batch jobs thereby achieving end-to-end latencies as low as 100 milliseconds and exactly-once fault-tolerance guarantees. However, since Spark 2.3, we have introduced a new low-latency processing mode called **Continuous Processing**, which can achieve end-to-end latencies as low as 1 millisecond with at-least-once guarantees. Without changing the Dataset/DataFrame operations in your queries, you will be able to choose the mode based on your application requirements.

In this guide, we are going to walk you through the programming model and the APIs. We are going to explain the concepts mostly using the default micro-batch processing model, and then [later](#continuous-processing-experimental) discuss Continuous Processing model. First, let's start with a simple example of a Structured Streaming query - a streaming word count.
In this guide, we are going to walk you through the programming model and the APIs. We are going to explain the concepts mostly using the default micro-batch processing model, and then [later](#continuous-processing) discuss Continuous Processing model. First, let's start with a simple example of a Structured Streaming query - a streaming word count.
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the broken link. Previously, it points like the following instead of that section.

- Joins can be cascaded, that is, you can do `df1.join(df2, ...).join(df3, ...).join(df4, ....)`.

- As of Spark 2.3, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.
- As of Spark 2.4, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's 2.4 because this patch will be backported to branch-2.4. We will update this later when we prepare 3.0.0.

For that situation you must specify the processing logic in an object.

1. The function takes a row as input.
- First, the function takes a row as input.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is the current screenshot of our 2.4.3 page. The enumeration doesn't work well with template. Also, the python example has redundant leading spaces inconsistently from the other Python examples.
Screen Shot 2019-08-02 at 10 48 02 PM

.format("rate")
.option("rowsPerSecond", "10")
.option("")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be incomplete leftover. We had better remove this because the other language tabs don't have this. After removing this, we will have Kafka examples consistently across all languages.

<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html">Kubernetes</a> cluster in
<code>cluster</code> mode. Client mode is currently unsupported and will be supported in future releases.
The <code>HOST</code> and <code>PORT</code> refer to the [Kubernetes API Server](https://kubernetes.io/docs/reference/generated/kube-apiserver/).
The <code>HOST</code> and <code>PORT</code> refer to the <a href="https://kubernetes.io/docs/reference/generated/kube-apiserver/">Kubernetes API Server</a>.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside table, we need to use a tag like line 183.

@SparkQA
Copy link

SparkQA commented Aug 3, 2019

Test build #108598 has finished for PR 25345 at commit 32b2265.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval, @HyukjinKwon .
Merged to master/2.4.

dongjoon-hyun added a commit that referenced this pull request Aug 4, 2019
This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release.

- `building-spark.md`
![Screen Shot 2019-08-02 at 10 33 51 PM](https://user-images.githubusercontent.com/9700541/62407962-a248ec80-b575-11e9-8a16-532e9bc421f8.png)

- `configuration.md`
![Screen Shot 2019-08-02 at 10 34 52 PM](https://user-images.githubusercontent.com/9700541/62407969-c7d5f600-b575-11e9-9b1a-a76c6cc095c5.png)

- `sql-pyspark-pandas-with-arrow.md`
![Screen Shot 2019-08-02 at 10 36 14 PM](https://user-images.githubusercontent.com/9700541/62407979-18e5ea00-b576-11e9-99af-7ad9264656ae.png)

- `streaming-programming-guide.md`
![Screen Shot 2019-08-02 at 10 37 11 PM](https://user-images.githubusercontent.com/9700541/62407981-213e2500-b576-11e9-8bc5-a925df7e98a7.png)

- `structured-streaming-programming-guide.md` (1/2)
![Screen Shot 2019-08-02 at 10 38 20 PM](https://user-images.githubusercontent.com/9700541/62408001-49c61f00-b576-11e9-9519-f699775ceecd.png)

- `structured-streaming-programming-guide.md` (2/2)
![Screen Shot 2019-08-02 at 10 40 05 PM](https://user-images.githubusercontent.com/9700541/62408017-7f6b0800-b576-11e9-9341-52664ba6b460.png)

- `submitting-applications.md`
![Screen Shot 2019-08-02 at 10 41 13 PM](https://user-images.githubusercontent.com/9700541/62408027-b2ad9700-b576-11e9-910e-8f22173e1251.png)

Manual. Build the doc.
```
SKIP_API=1 jekyll build
```

Closes #25345 from dongjoon-hyun/SPARK-28609.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 4856c0e)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-28609 branch August 4, 2019 17:25
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release.

- `building-spark.md`
![Screen Shot 2019-08-02 at 10 33 51 PM](https://user-images.githubusercontent.com/9700541/62407962-a248ec80-b575-11e9-8a16-532e9bc421f8.png)

- `configuration.md`
![Screen Shot 2019-08-02 at 10 34 52 PM](https://user-images.githubusercontent.com/9700541/62407969-c7d5f600-b575-11e9-9b1a-a76c6cc095c5.png)

- `sql-pyspark-pandas-with-arrow.md`
![Screen Shot 2019-08-02 at 10 36 14 PM](https://user-images.githubusercontent.com/9700541/62407979-18e5ea00-b576-11e9-99af-7ad9264656ae.png)

- `streaming-programming-guide.md`
![Screen Shot 2019-08-02 at 10 37 11 PM](https://user-images.githubusercontent.com/9700541/62407981-213e2500-b576-11e9-8bc5-a925df7e98a7.png)

- `structured-streaming-programming-guide.md` (1/2)
![Screen Shot 2019-08-02 at 10 38 20 PM](https://user-images.githubusercontent.com/9700541/62408001-49c61f00-b576-11e9-9519-f699775ceecd.png)

- `structured-streaming-programming-guide.md` (2/2)
![Screen Shot 2019-08-02 at 10 40 05 PM](https://user-images.githubusercontent.com/9700541/62408017-7f6b0800-b576-11e9-9341-52664ba6b460.png)

- `submitting-applications.md`
![Screen Shot 2019-08-02 at 10 41 13 PM](https://user-images.githubusercontent.com/9700541/62408027-b2ad9700-b576-11e9-910e-8f22173e1251.png)

Manual. Build the doc.
```
SKIP_API=1 jekyll build
```

Closes apache#25345 from dongjoon-hyun/SPARK-28609.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 4856c0e)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
This PR aims to fix the broken styles/links and make the doc up-to-date for Apache Spark 2.4.4 and 3.0.0 release.

- `building-spark.md`
![Screen Shot 2019-08-02 at 10 33 51 PM](https://user-images.githubusercontent.com/9700541/62407962-a248ec80-b575-11e9-8a16-532e9bc421f8.png)

- `configuration.md`
![Screen Shot 2019-08-02 at 10 34 52 PM](https://user-images.githubusercontent.com/9700541/62407969-c7d5f600-b575-11e9-9b1a-a76c6cc095c5.png)

- `sql-pyspark-pandas-with-arrow.md`
![Screen Shot 2019-08-02 at 10 36 14 PM](https://user-images.githubusercontent.com/9700541/62407979-18e5ea00-b576-11e9-99af-7ad9264656ae.png)

- `streaming-programming-guide.md`
![Screen Shot 2019-08-02 at 10 37 11 PM](https://user-images.githubusercontent.com/9700541/62407981-213e2500-b576-11e9-8bc5-a925df7e98a7.png)

- `structured-streaming-programming-guide.md` (1/2)
![Screen Shot 2019-08-02 at 10 38 20 PM](https://user-images.githubusercontent.com/9700541/62408001-49c61f00-b576-11e9-9519-f699775ceecd.png)

- `structured-streaming-programming-guide.md` (2/2)
![Screen Shot 2019-08-02 at 10 40 05 PM](https://user-images.githubusercontent.com/9700541/62408017-7f6b0800-b576-11e9-9341-52664ba6b460.png)

- `submitting-applications.md`
![Screen Shot 2019-08-02 at 10 41 13 PM](https://user-images.githubusercontent.com/9700541/62408027-b2ad9700-b576-11e9-910e-8f22173e1251.png)

Manual. Build the doc.
```
SKIP_API=1 jekyll build
```

Closes apache#25345 from dongjoon-hyun/SPARK-28609.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 4856c0e)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments