@@ -12,17 +12,71 @@ displays useful information about the application. This includes:
1212
1313* A list of scheduler stages and tasks
1414* A summary of RDD sizes and memory usage
15- * Information about the running executors
1615* Environmental information.
16+ * Information about the running executors
1717
1818You can access this interface by simply opening ` http://<driver-node>:4040 ` in a web browser.
19- If multiple SparkContexts are running on the same host, they will bind to succesive ports
19+ If multiple SparkContexts are running on the same host, they will bind to successive ports
2020beginning with 4040 (4041, 4042, etc).
2121
22- Spark's Standalone Mode cluster manager also has its own
23- [ web UI] ( spark-standalone.html#monitoring-and-logging ) .
22+ Note that this information is only available for the duration of the application by default.
23+ To view the web UI after the fact, set ` spark.eventLog.enabled ` to true before starting the
24+ application. This configures Spark to log Spark events that encode the information displayed
25+ in the UI to persisted storage.
2426
25- Note that in both of these UIs, the tables are sortable by clicking their headers,
27+ ## Viewing After the Fact
28+
29+ Spark's Standalone Mode cluster manager also has its own
30+ [ web UI] ( spark-standalone.html#monitoring-and-logging ) . If an application has logged events over
31+ the course of its lifetime, then the Standalone master's web UI will automatically re-render the
32+ application's UI after the application has finished.
33+
34+ If Spark is run on Mesos or YARN, it is still possible to reconstruct the UI of a finished
35+ application through Spark's history server, provided that the application's event logs exist.
36+ You can start a the history server by executing:
37+
38+ ./sbin/start-history-server.sh <base-logging-directory>
39+
40+ The base logging directory must be supplied, and should contain sub-directories that each
41+ represents an application's event logs. This creates a web interface at
42+ ` http://<server-url>:18080 ` by default, but the port can be changed by supplying an extra
43+ parameter to the start script. The history server depends on the following variables:
44+
45+ <table class =" table " >
46+ <tr ><th style =" width :21% " >Environment Variable</th ><th >Meaning</th ></tr >
47+ <tr >
48+ <td><code>SPARK_DAEMON_MEMORY</code></td>
49+ <td>Memory to allocate to the history server. (default: 512m).</td>
50+ </tr >
51+ <tr >
52+ <td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
53+ <td>JVM options for the history server (default: none).</td>
54+ </tr >
55+ </table >
56+
57+ Further, the history server can be configured as follows:
58+
59+ <table class =" table " >
60+ <tr ><th >Property Name</th ><th >Default</th ><th >Meaning</th ></tr >
61+ <tr >
62+ <td>spark.history.updateInterval</td>
63+ <td>10</td>
64+ <td>
65+ The period at which information displayed by this history server is updated. Each update
66+ checks for any changes made to the event logs in persisted storage.
67+ </td>
68+ </tr >
69+ <tr >
70+ <td>spark.history.retainedApplications</td>
71+ <td>250</td>
72+ <td>
73+ The number of application UIs to retain. If this cap is exceeded, then the least recently
74+ updated applications will be removed.
75+ </td>
76+ </tr >
77+ </table >
78+
79+ Note that in all of these UIs, the tables are sortable by clicking their headers,
2680making it easy to identify slow tasks, data skew, etc.
2781
2882# Metrics
0 commit comments