Skip to content
This repository was archived by the owner on Nov 15, 2024. It is now read-only.

Commit 544a18d

Browse files
John O'LearyMatthewRBruce
authored andcommitted
[SPARK-22107] Change as to alias in python quickstart
## What changes were proposed in this pull request? Updated docs so that a line of python in the quick start guide executes. Closes apache#19283 ## How was this patch tested? Existing tests. Author: John O'Leary <[email protected]> Closes apache#19326 from jgoleary/issues/22107. (cherry picked from commit 20adf9a) Signed-off-by: hyukjinkwon <[email protected]>
1 parent e4f1036 commit 544a18d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin
153153
One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily:
154154

155155
{% highlight python %}
156-
>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count()
156+
>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count()
157157
{% endhighlight %}
158158

159159
Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:

0 commit comments

Comments
 (0)