Add documentation for the HistoryServer

andrewor14 · andrewor14 · commit 2282300ddbc6 · 2014-04-09T12:51:02.000-07:00
diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala
@@ -76,6 +76,8 @@ class HistoryServer(
    *
    * If a log check is invoked manually in the middle of a period, this thread re-adjusts the
    * time at which it performs the next log check to maintain the same period as before.
+   *
+   * TODO: Add a mechanism to update manually.
    */
   private val logCheckingThread = new Thread {
     override def run() {
@@ -292,7 +294,7 @@ object HistoryServer {
   val UPDATE_INTERVAL_MS = conf.getInt("spark.history.updateInterval", 10) * 1000
 
   // How many applications to retain
-  val RETAINED_APPLICATIONS = conf.getInt("spark.deploy.retainedApplications", 250)
+  val RETAINED_APPLICATIONS = conf.getInt("spark.history.retainedApplications", 250)
 
   val STATIC_RESOURCE_DIR = SparkUI.STATIC_RESOURCE_DIR
 
diff --git a/docs/monitoring.md b/docs/monitoring.md
@@ -12,17 +12,71 @@ displays useful information about the application. This includes:
 
 * A list of scheduler stages and tasks
 * A summary of RDD sizes and memory usage
-* Information about the running executors
 * Environmental information.
+* Information about the running executors
 
 You can access this interface by simply opening `http://<driver-node>:4040` in a web browser.
-If multiple SparkContexts are running on the same host, they will bind to succesive ports
+If multiple SparkContexts are running on the same host, they will bind to successive ports
 beginning with 4040 (4041, 4042, etc).
 
-Spark's Standalone Mode cluster manager also has its own
-[web UI](spark-standalone.html#monitoring-and-logging). 
+Note that this information is only available for the duration of the application by default.
+To view the web UI after the fact, set `spark.eventLog.enabled` to true before starting the
+application. This configures Spark to log Spark events that encode the information displayed
+in the UI to persisted storage.
 
-Note that in both of these UIs, the tables are sortable by clicking their headers,
+## Viewing After the Fact
+
+Spark's Standalone Mode cluster manager also has its own
+[web UI](spark-standalone.html#monitoring-and-logging). If an application has logged events over
+the course of its lifetime, then the Standalone master's web UI will automatically re-render the
+application's UI after the application has finished.
+
+If Spark is run on Mesos or YARN, it is still possible to reconstruct the UI of a finished
+application through Spark's history server, provided that the application's event logs exist.
+You can start a the history server by executing:
+
+    ./sbin/start-history-server.sh <base-logging-directory>
+
+The base logging directory must be supplied, and should contain sub-directories that each
+represents an application's event logs. This creates a web interface at
+`http://<server-url>:18080` by default, but the port can be changed by supplying an extra
+parameter to the start script. The history server depends on the following variables:
+
+<table class="table">
+  <tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr>
+  <tr>
+    <td><code>SPARK_DAEMON_MEMORY</code></td>
+    <td>Memory to allocate to the history server. (default: 512m).</td>
+  </tr>
+  <tr>
+    <td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
+    <td>JVM options for the history server (default: none).</td>
+  </tr>
+</table>
+
+Further, the history server can be configured as follows:
+
+<table class="table">
+  <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+  <tr>
+    <td>spark.history.updateInterval</td>
+    <td>10</td>
+    <td>
+      The period at which information displayed by this history server is updated. Each update
+      checks for any changes made to the event logs in persisted storage.
+    </td>
+  </tr>
+  <tr>
+    <td>spark.history.retainedApplications</td>
+    <td>250</td>
+    <td>
+      The number of application UIs to retain. If this cap is exceeded, then the least recently
+      updated applications will be removed.
+    </td>
+  </tr>
+</table>
+
+Note that in all of these UIs, the tables are sortable by clicking their headers,
 making it easy to identify slow tasks, data skew, etc.
 
 # Metrics