Skip to content

Commit 84a076e

Browse files
ashashwatgatorsmile
authored andcommitted
[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.
## What changes were proposed in this pull request? Fix spelling in quick-start doc. ## How was this patch tested? Doc only. Author: Shashwat Anand <[email protected]> Closes #20336 from ashashwat/SPARK-23165.
1 parent 396cdfb commit 84a076e

14 files changed

+37
-37
lines changed

docs/cloud-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,10 +180,10 @@ under the path, not the number of *new* files, so it can become a slow operation
180180
The size of the window needs to be set to handle this.
181181

182182
1. Files only appear in an object store once they are completely written; there
183-
is no need for a worklow of write-then-rename to ensure that files aren't picked up
183+
is no need for a workflow of write-then-rename to ensure that files aren't picked up
184184
while they are still being written. Applications can write straight to the monitored directory.
185185

186-
1. Streams should only be checkpointed to an store implementing a fast and
186+
1. Streams should only be checkpointed to a store implementing a fast and
187187
atomic `rename()` operation Otherwise the checkpointing may be slow and potentially unreliable.
188188

189189
## Further Reading

docs/configuration.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Then, you can supply configuration values at runtime:
7979
{% endhighlight %}
8080

8181
The Spark shell and [`spark-submit`](submitting-applications.html)
82-
tool support two ways to load configurations dynamically. The first are command line options,
82+
tool support two ways to load configurations dynamically. The first is command line options,
8383
such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf`
8484
flag, but uses special flags for properties that play a part in launching the Spark application.
8585
Running `./bin/spark-submit --help` will show the entire list of these options.
@@ -413,7 +413,7 @@ Apart from these, the following properties are also available, and may be useful
413413
<td>false</td>
414414
<td>
415415
Enable profiling in Python worker, the profile result will show up by <code>sc.show_profiles()</code>,
416-
or it will be displayed before the driver exiting. It also can be dumped into disk by
416+
or it will be displayed before the driver exits. It also can be dumped into disk by
417417
<code>sc.dump_profiles(path)</code>. If some of the profile results had been displayed manually,
418418
they will not be displayed automatically before driver exiting.
419419

@@ -446,7 +446,7 @@ Apart from these, the following properties are also available, and may be useful
446446
<td>true</td>
447447
<td>
448448
Reuse Python worker or not. If yes, it will use a fixed number of Python workers,
449-
does not need to fork() a Python process for every tasks. It will be very useful
449+
does not need to fork() a Python process for every task. It will be very useful
450450
if there is large broadcast, then the broadcast will not be needed to transferred
451451
from JVM to Python worker for every task.
452452
</td>
@@ -1294,7 +1294,7 @@ Apart from these, the following properties are also available, and may be useful
12941294
<td><code>spark.files.openCostInBytes</code></td>
12951295
<td>4194304 (4 MB)</td>
12961296
<td>
1297-
The estimated cost to open a file, measured by the number of bytes could be scanned in the same
1297+
The estimated cost to open a file, measured by the number of bytes could be scanned at the same
12981298
time. This is used when putting multiple files into a partition. It is better to over estimate,
12991299
then the partitions with small files will be faster than partitions with bigger files.
13001300
</td>
@@ -1855,8 +1855,8 @@ Apart from these, the following properties are also available, and may be useful
18551855
<td><code>spark.user.groups.mapping</code></td>
18561856
<td><code>org.apache.spark.security.ShellBasedGroupsMappingProvider</code></td>
18571857
<td>
1858-
The list of groups for a user are determined by a group mapping service defined by the trait
1859-
org.apache.spark.security.GroupMappingServiceProvider which can configured by this property.
1858+
The list of groups for a user is determined by a group mapping service defined by the trait
1859+
org.apache.spark.security.GroupMappingServiceProvider which can be configured by this property.
18601860
A default unix shell based implementation is provided <code>org.apache.spark.security.ShellBasedGroupsMappingProvider</code>
18611861
which can be specified to resolve a list of groups for a user.
18621862
<em>Note:</em> This implementation supports only a Unix/Linux based environment. Windows environment is
@@ -2465,7 +2465,7 @@ should be included on Spark's classpath:
24652465

24662466
The location of these configuration files varies across Hadoop versions, but
24672467
a common location is inside of `/etc/hadoop/conf`. Some tools create
2468-
configurations on-the-fly, but offer a mechanisms to download copies of them.
2468+
configurations on-the-fly, but offer a mechanism to download copies of them.
24692469

24702470
To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh`
24712471
to a location containing the configuration files.

docs/graphx-programming-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -708,7 +708,7 @@ messages remaining.
708708
> messaging function. These constraints allow additional optimization within GraphX.
709709
710710
The following is the type signature of the [Pregel operator][GraphOps.pregel] as well as a *sketch*
711-
of its implementation (note: to avoid stackOverflowError due to long lineage chains, pregel support periodcally
711+
of its implementation (note: to avoid stackOverflowError due to long lineage chains, pregel support periodically
712712
checkpoint graph and messages by setting "spark.graphx.pregel.checkpointInterval" to a positive number,
713713
say 10. And set checkpoint directory as well using SparkContext.setCheckpointDir(directory: String)):
714714

@@ -928,7 +928,7 @@ switch to 2D-partitioning or other heuristics included in GraphX.
928928
<!-- Images are downsized intentionally to improve quality on retina displays -->
929929
</p>
930930

931-
Once the edges have be partitioned the key challenge to efficient graph-parallel computation is
931+
Once the edges have been partitioned the key challenge to efficient graph-parallel computation is
932932
efficiently joining vertex attributes with the edges. Because real-world graphs typically have more
933933
edges than vertices, we move vertex attributes to the edges. Because not all partitions will
934934
contain edges adjacent to all vertices we internally maintain a routing table which identifies where

docs/monitoring.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ The history server can be configured as follows:
118118
<td>
119119
The number of applications to retain UI data for in the cache. If this cap is exceeded, then
120120
the oldest applications will be removed from the cache. If an application is not in the cache,
121-
it will have to be loaded from disk if its accessed from the UI.
121+
it will have to be loaded from disk if it is accessed from the UI.
122122
</td>
123123
</tr>
124124
<tr>
@@ -407,7 +407,7 @@ can be identified by their `[attempt-id]`. In the API listed below, when running
407407
</tr>
408408
</table>
409409

410-
The number of jobs and stages which can retrieved is constrained by the same retention
410+
The number of jobs and stages which can be retrieved is constrained by the same retention
411411
mechanism of the standalone Spark UI; `"spark.ui.retainedJobs"` defines the threshold
412412
value triggering garbage collection on jobs, and `spark.ui.retainedStages` that for stages.
413413
Note that the garbage collection takes place on playback: it is possible to retrieve
@@ -422,10 +422,10 @@ These endpoints have been strongly versioned to make it easier to develop applic
422422
* Individual fields will never be removed for any given endpoint
423423
* New endpoints may be added
424424
* New fields may be added to existing endpoints
425-
* New versions of the api may be added in the future at a separate endpoint (eg., `api/v2`). New versions are *not* required to be backwards compatible.
425+
* New versions of the api may be added in the future as a separate endpoint (eg., `api/v2`). New versions are *not* required to be backwards compatible.
426426
* Api versions may be dropped, but only after at least one minor release of co-existing with a new api version.
427427

428-
Note that even when examining the UI of a running applications, the `applications/[app-id]` portion is
428+
Note that even when examining the UI of running applications, the `applications/[app-id]` portion is
429429
still required, though there is only one application available. Eg. to see the list of jobs for the
430430
running app, you would go to `http://localhost:4040/api/v1/applications/[app-id]/jobs`. This is to
431431
keep the paths consistent in both modes.

docs/quick-start.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ res3: Long = 15
6767
./bin/pyspark
6868

6969

70-
Or if PySpark is installed with pip in your current enviroment:
70+
Or if PySpark is installed with pip in your current environment:
7171

7272
pyspark
7373

@@ -156,7 +156,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
156156
>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count()
157157
{% endhighlight %}
158158

159-
Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
159+
Here, we use the `explode` function in `select`, to transform a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
160160

161161
{% highlight python %}
162162
>>> wordCounts.collect()
@@ -422,7 +422,7 @@ $ YOUR_SPARK_HOME/bin/spark-submit \
422422
Lines with a: 46, Lines with b: 23
423423
{% endhighlight %}
424424

425-
If you have PySpark pip installed into your enviroment (e.g., `pip install pyspark`), you can run your application with the regular Python interpreter or use the provided 'spark-submit' as you prefer.
425+
If you have PySpark pip installed into your environment (e.g., `pip install pyspark`), you can run your application with the regular Python interpreter or use the provided 'spark-submit' as you prefer.
426426

427427
{% highlight bash %}
428428
# Use the Python interpreter to run your application

docs/running-on-mesos.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ can find the results of the driver from the Mesos Web UI.
154154
To use cluster mode, you must start the `MesosClusterDispatcher` in your cluster via the `sbin/start-mesos-dispatcher.sh` script,
155155
passing in the Mesos master URL (e.g: mesos://host:5050). This starts the `MesosClusterDispatcher` as a daemon running on the host.
156156

157-
By setting the Mesos proxy config property (requires mesos version >= 1.4), `--conf spark.mesos.proxy.baseURL=http://localhost:5050` when launching the dispacther, the mesos sandbox URI for each driver is added to the mesos dispatcher UI.
157+
By setting the Mesos proxy config property (requires mesos version >= 1.4), `--conf spark.mesos.proxy.baseURL=http://localhost:5050` when launching the dispatcher, the mesos sandbox URI for each driver is added to the mesos dispatcher UI.
158158

159159
If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`). Note that the `MesosClusterDispatcher` not yet supports multiple instances for HA.
160160

docs/running-on-yarn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ To use a custom metrics.properties for the application master and executors, upd
445445
<code>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</code> should be
446446
configured in yarn-site.xml.
447447
This feature can only be used with Hadoop 2.6.4+. The Spark log4j appender needs be changed to use
448-
FileAppender or another appender that can handle the files being removed while its running. Based
448+
FileAppender or another appender that can handle the files being removed while it is running. Based
449449
on the file name configured in the log4j configuration (like spark.log), the user should set the
450450
regex (spark*) to include all the log files that need to be aggregated.
451451
</td>

docs/security.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ component-specific configuration namespaces used to override the default setting
6262
</tr>
6363
</table>
6464

65-
The full breakdown of available SSL options can be found on the [configuration page](configuration.html).
65+
The full breakdown of available SSL options can be found on the [configuration page](configuration.html).
6666
SSL must be configured on each node and configured for each component involved in communication using the particular protocol.
6767

6868
### YARN mode

docs/sql-programming-guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1253,7 +1253,7 @@ provide a ClassTag.
12531253
(Note that this is different than the Spark SQL JDBC server, which allows other applications to
12541254
run queries using Spark SQL).
12551255

1256-
To get started you will need to include the JDBC driver for you particular database on the
1256+
To get started you will need to include the JDBC driver for your particular database on the
12571257
spark classpath. For example, to connect to postgres from the Spark Shell you would run the
12581258
following command:
12591259

@@ -1793,7 +1793,7 @@ options.
17931793
- Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.concatBinaryAsString` to `true`.
17941794
- Since Spark 2.3, when all inputs are binary, SQL `elt()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.eltOutputAsString` to `true`.
17951795

1796-
- Since Spark 2.3, by default arithmetic operations between decimals return a rounded value if an exact representation is not possible (instead of returning NULL). This is compliant to SQL ANSI 2011 specification and Hive's new behavior introduced in Hive 2.2 (HIVE-15331). This involves the following changes
1796+
- Since Spark 2.3, by default arithmetic operations between decimals return a rounded value if an exact representation is not possible (instead of returning NULL). This is compliant with SQL ANSI 2011 specification and Hive's new behavior introduced in Hive 2.2 (HIVE-15331). This involves the following changes
17971797
- The rules to determine the result type of an arithmetic operation have been updated. In particular, if the precision / scale needed are out of the range of available values, the scale is reduced up to 6, in order to prevent the truncation of the integer part of the decimals. All the arithmetic operations are affected by the change, ie. addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), remainder (`%`) and positive module (`pmod`).
17981798
- Literal values used in SQL operations are converted to DECIMAL with the exact precision and scale needed by them.
17991799
- The configuration `spark.sql.decimalOperations.allowPrecisionLoss` has been introduced. It defaults to `true`, which means the new behavior described here; if set to `false`, Spark uses previous rules, ie. it doesn't adjust the needed scale to represent the values and it returns NULL if an exact representation of the value is not possible.
@@ -1821,7 +1821,7 @@ options.
18211821
transformations (e.g., `map`, `filter`, and `groupByKey`) and untyped transformations (e.g.,
18221822
`select` and `groupBy`) are available on the Dataset class. Since compile-time type-safety in
18231823
Python and R is not a language feature, the concept of Dataset does not apply to these languages’
1824-
APIs. Instead, `DataFrame` remains the primary programing abstraction, which is analogous to the
1824+
APIs. Instead, `DataFrame` remains the primary programming abstraction, which is analogous to the
18251825
single-node data frame notion in these languages.
18261826

18271827
- Dataset and DataFrame API `unionAll` has been deprecated and replaced by `union`
@@ -1997,7 +1997,7 @@ Java and Python users will need to update their code.
19971997

19981998
Prior to Spark 1.3 there were separate Java compatible classes (`JavaSQLContext` and `JavaSchemaRDD`)
19991999
that mirrored the Scala API. In Spark 1.3 the Java API and Scala API have been unified. Users
2000-
of either language should use `SQLContext` and `DataFrame`. In general theses classes try to
2000+
of either language should use `SQLContext` and `DataFrame`. In general these classes try to
20012001
use types that are usable from both languages (i.e. `Array` instead of language specific collections).
20022002
In some cases where no common type exists (e.g., for passing in closures or Maps) function overloading
20032003
is used instead.

docs/storage-openstack-swift.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Create <code>core-site.xml</code> and place it inside Spark's <code>conf</code>
4242
The main category of parameters that should be configured are the authentication parameters
4343
required by Keystone.
4444

45-
The following table contains a list of Keystone mandatory parameters. <code>PROVIDER</code> can be
45+
The following table contains a list of Keystone mandatory parameters. <code>PROVIDER</code> can be
4646
any (alphanumeric) name.
4747

4848
<table class="table">

0 commit comments

Comments
 (0)