You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.
## What changes were proposed in this pull request?
Fix spelling in quick-start doc.
## How was this patch tested?
Doc only.
Author: Shashwat Anand <[email protected]>
Closes#20336 from ashashwat/SPARK-23165.
Copy file name to clipboardExpand all lines: docs/monitoring.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -118,7 +118,7 @@ The history server can be configured as follows:
118
118
<td>
119
119
The number of applications to retain UI data for in the cache. If this cap is exceeded, then
120
120
the oldest applications will be removed from the cache. If an application is not in the cache,
121
-
it will have to be loaded from disk if its accessed from the UI.
121
+
it will have to be loaded from disk if it is accessed from the UI.
122
122
</td>
123
123
</tr>
124
124
<tr>
@@ -407,7 +407,7 @@ can be identified by their `[attempt-id]`. In the API listed below, when running
407
407
</tr>
408
408
</table>
409
409
410
-
The number of jobs and stages which can retrieved is constrained by the same retention
410
+
The number of jobs and stages which can be retrieved is constrained by the same retention
411
411
mechanism of the standalone Spark UI; `"spark.ui.retainedJobs"` defines the threshold
412
412
value triggering garbage collection on jobs, and `spark.ui.retainedStages` that for stages.
413
413
Note that the garbage collection takes place on playback: it is possible to retrieve
@@ -422,10 +422,10 @@ These endpoints have been strongly versioned to make it easier to develop applic
422
422
* Individual fields will never be removed for any given endpoint
423
423
* New endpoints may be added
424
424
* New fields may be added to existing endpoints
425
-
* New versions of the api may be added in the future at a separate endpoint (eg., `api/v2`). New versions are *not* required to be backwards compatible.
425
+
* New versions of the api may be added in the future as a separate endpoint (eg., `api/v2`). New versions are *not* required to be backwards compatible.
426
426
* Api versions may be dropped, but only after at least one minor release of co-existing with a new api version.
427
427
428
-
Note that even when examining the UI of a running applications, the `applications/[app-id]` portion is
428
+
Note that even when examining the UI of running applications, the `applications/[app-id]` portion is
429
429
still required, though there is only one application available. Eg. to see the list of jobs for the
430
430
running app, you would go to `http://localhost:4040/api/v1/applications/[app-id]/jobs`. This is to
Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
159
+
Here, we use the `explode` function in `select`, to transform a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
If you have PySpark pip installed into your enviroment (e.g., `pip install pyspark`), you can run your application with the regular Python interpreter or use the provided 'spark-submit' as you prefer.
425
+
If you have PySpark pip installed into your environment (e.g., `pip install pyspark`), you can run your application with the regular Python interpreter or use the provided 'spark-submit' as you prefer.
426
426
427
427
{% highlight bash %}
428
428
# Use the Python interpreter to run your application
Copy file name to clipboardExpand all lines: docs/running-on-mesos.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,7 @@ can find the results of the driver from the Mesos Web UI.
154
154
To use cluster mode, you must start the `MesosClusterDispatcher` in your cluster via the `sbin/start-mesos-dispatcher.sh` script,
155
155
passing in the Mesos master URL (e.g: mesos://host:5050). This starts the `MesosClusterDispatcher` as a daemon running on the host.
156
156
157
-
By setting the Mesos proxy config property (requires mesos version >= 1.4), `--conf spark.mesos.proxy.baseURL=http://localhost:5050` when launching the dispacther, the mesos sandbox URI for each driver is added to the mesos dispatcher UI.
157
+
By setting the Mesos proxy config property (requires mesos version >= 1.4), `--conf spark.mesos.proxy.baseURL=http://localhost:5050` when launching the dispatcher, the mesos sandbox URI for each driver is added to the mesos dispatcher UI.
158
158
159
159
If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`). Note that the `MesosClusterDispatcher` not yet supports multiple instances for HA.
Copy file name to clipboardExpand all lines: docs/sql-programming-guide.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1253,7 +1253,7 @@ provide a ClassTag.
1253
1253
(Note that this is different than the Spark SQL JDBC server, which allows other applications to
1254
1254
run queries using Spark SQL).
1255
1255
1256
-
To get started you will need to include the JDBC driver for you particular database on the
1256
+
To get started you will need to include the JDBC driver for your particular database on the
1257
1257
spark classpath. For example, to connect to postgres from the Spark Shell you would run the
1258
1258
following command:
1259
1259
@@ -1793,7 +1793,7 @@ options.
1793
1793
- Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.concatBinaryAsString` to `true`.
1794
1794
- Since Spark 2.3, when all inputs are binary, SQL `elt()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.eltOutputAsString` to `true`.
1795
1795
1796
-
- Since Spark 2.3, by default arithmetic operations between decimals return a rounded value if an exact representation is not possible (instead of returning NULL). This is compliant to SQL ANSI 2011 specification and Hive's new behavior introduced in Hive 2.2 (HIVE-15331). This involves the following changes
1796
+
- Since Spark 2.3, by default arithmetic operations between decimals return a rounded value if an exact representation is not possible (instead of returning NULL). This is compliant with SQL ANSI 2011 specification and Hive's new behavior introduced in Hive 2.2 (HIVE-15331). This involves the following changes
1797
1797
- The rules to determine the result type of an arithmetic operation have been updated. In particular, if the precision / scale needed are out of the range of available values, the scale is reduced up to 6, in order to prevent the truncation of the integer part of the decimals. All the arithmetic operations are affected by the change, ie. addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), remainder (`%`) and positive module (`pmod`).
1798
1798
- Literal values used in SQL operations are converted to DECIMAL with the exact precision and scale needed by them.
1799
1799
- The configuration `spark.sql.decimalOperations.allowPrecisionLoss` has been introduced. It defaults to `true`, which means the new behavior described here; if set to `false`, Spark uses previous rules, ie. it doesn't adjust the needed scale to represent the values and it returns NULL if an exact representation of the value is not possible.
@@ -1821,7 +1821,7 @@ options.
1821
1821
transformations (e.g., `map`, `filter`, and `groupByKey`) and untyped transformations (e.g.,
1822
1822
`select` and `groupBy`) are available on the Dataset class. Since compile-time type-safety in
1823
1823
Python and R is not a language feature, the concept of Dataset does not apply to these languages’
1824
-
APIs. Instead, `DataFrame` remains the primary programing abstraction, which is analogous to the
1824
+
APIs. Instead, `DataFrame` remains the primary programming abstraction, which is analogous to the
1825
1825
single-node data frame notion in these languages.
1826
1826
1827
1827
- Dataset and DataFrame API `unionAll` has been deprecated and replaced by `union`
@@ -1997,7 +1997,7 @@ Java and Python users will need to update their code.
1997
1997
1998
1998
Prior to Spark 1.3 there were separate Java compatible classes (`JavaSQLContext` and `JavaSchemaRDD`)
1999
1999
that mirrored the Scala API. In Spark 1.3 the Java API and Scala API have been unified. Users
2000
-
of either language should use `SQLContext` and `DataFrame`. In general theses classes try to
2000
+
of either language should use `SQLContext` and `DataFrame`. In general these classes try to
2001
2001
use types that are usable from both languages (i.e. `Array` instead of language specific collections).
2002
2002
In some cases where no common type exists (e.g., for passing in closures or Maps) function overloading
0 commit comments