apache
diff --git a/‎docs/architecture.md‎
Lines changed: 55 additions & 38 deletions b/‎docs/architecture.md‎
Lines changed: 55 additions & 38 deletions
diff --git a/‎docs/configuration.md‎
Lines changed: 89 additions & 4 deletions b/‎docs/configuration.md‎
Lines changed: 89 additions & 4 deletions
diff --git a/‎docs/metrics_logging.md‎
Lines changed: 0 additions & 109 deletions b/‎docs/metrics_logging.md‎
Lines changed: 0 additions & 109 deletions
@@ -20,45 +20,62 @@ under the License.
 # Design & Architecture
 
 **Spark-Kubernetes-Operator** (Operator) acts as a control plane to manage the complete
-deployment lifecycle of Spark applications. The Operator can be installed on a Kubernetes
-cluster using Helm. In most production environments it is typically deployed in a designated
-namespace and controls Spark deployments in one or more managed namespaces. The custom resource
-definition (CRD) that describes the schema of a SparkApplication is a cluster wide resource.
-For a CRD, the declaration must be registered before any resources of that CRDs kind(s) can be
-used, and the registration process sometimes takes a few seconds.
-
-Users can interact with the operator using the kubectl or k8s API. The Operator continuously
-tracks cluster events relating to the SparkApplication custom resources. When the operator
-receives a new resource update, it will take action to adjust the Kubernetes cluster to the
-desired state as part of its reconciliation loop. The initial loop consists of the following
-high-level steps:
+deployment lifecycle of Spark applications and clusters. The Operator can be installed on Kubernetes
+cluster(s) using Helm. In most production environments it is typically deployed in a designated
+namespace and controls Spark workload in one or more managed namespaces.
+Spark Operator enables user to describe Spark application(s) or cluster(s) as 
+[Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). 
+
+The Operator continuously tracks events related to the Spark custom resources in its reconciliation 
+loops:
+
+For SparkApplications:
 
 * User submits a SparkApplication custom resource(CR) using kubectl / API
 * Operator launches driver and observes its status
-* Operator observes driver-spawn resources (e.g. executors) till app terminates
+* Operator observes driver-spawn resources (e.g. executors) and record status till app terminates
 * Operator releases all Spark-app owned resources to cluster
-* The SparkApplication CR can be (re)applied on the cluster any time - e.g. to issue proactive
-  termination of an application. The Operator makes continuous adjustments to imitate the
-  desired state until the
-  current state becomes the desired state. All lifecycle management operations are realized
-  using this very simple
-  principle in the Operator.
-
-The Operator is built with the Java Operator SDK and uses the Native Kubernetes Integration for
-launching Spark deployments and submitting jobs under the hood. The Java Operator SDK is a
-higher level
-framework and related tooling to support writing Kubernetes Operators in Java. Both the Java
-Operator SDK and Spark’s native
-kubernetes integration itself is using the Fabric8 Kubernetes Client to interact with the
-Kubernetes API Server.
-
-## State Transition
-
-[<img src="resources/state.png">](resources/state.png)
-
-* Spark application are expected to run from submitted to succeeded before releasing resources
-* User may configure the app CR to time-out after given threshold of time
-* In addition, user may configure the app CR to skip releasing resources after terminated. This is
-  typically used at dev phase: pods / configmaps. etc would be kept for debugging. They have
-  ownerreference to the Application CR and therefore can still be cleaned up when the owner
-  SparkApplication CR is deleted.
+
+For SparkClusters:
+
+* User submits a SparkCluster custom resource(CR) using kubectl / API
+* Operator launches master and worker(s) based on CR spec and observes their status
+* Operator releases all Spark-cluster owned resources to cluster upon failure
+
+The Operator is built with the [Java Operator SDK](https://javaoperatorsdk.io/) for
+launching Spark deployments and submitting jobs under the hood. It also uses 
+[fabric8](https://fabric8.io/) client to interact with Kubernetes API Server.
+
+## Application State Transition
+
+[<img src="resources/application_state_machine.png">](resources/application_state_machine.png)
+
+* Spark applications are expected to run from submitted to succeeded before releasing resources
+* User may configure the app CR to time-out after given threshold of time if it cannot reach healthy
+  state after given threshold. The timeout can be configured for different lifecycle stages, 
+  when driver starting and when requesting executor pods. To update the default threshold,  
+  configure `.spec.applicationTolerations.applicationTimeoutConfig` for the application.        
+* K8s resources created for an application would be deleted as the final stage of the application 
+  lifecycle by default. This is to ensure resource quota release for completed applications.  
+* It is also possible to retain the created k8s resources for debug or audit purpose. To do so,   
+  user may set `.spec.applicationTolerations.resourceRetainPolicy` to `OnFailure` to retain 
+  resources upon application failure, or set to `Always` to retain resources regardless of 
+  application final state.
+    - This controls the behavior of k8s resources created by Operator for the application, including
+      driver pod, config map, service, and PVC(if enabled). This does not apply to resources created 
+      by driver (for example, executor pods). User may configure SparkConf to
+      include `spark.kubernetes.executor.deleteOnTermination` for executor retention. Please refer 
+      [Spark docs](https://spark.apache.org/docs/latest/running-on-kubernetes.html) for details.
+    - The created k8s resources have `ownerReference` to their related `SparkApplication` custom
+      resource, such that they could be garbage collected when the `SparkApplication` is deleted.
+    - Please be advised that k8s resources would not be retained if the application is configured to
+      restart. This is to avoid resource quota usage increase unexpectedly or resource conflicts 
+      among multiple attempts.
+
+## Cluster State Transition
+
+[<img src="resources/cluster_state_machine.png">](resources/application_state_machine.png)
+
+* Spark clusters are expected to be always running after submitted.
+* Similar to Spark applications, K8s resources created for a cluster would be deleted as the final 
+  stage of the cluster lifecycle by default.
@@ -43,17 +43,102 @@ Spark Operator supports different ways to configure the behavior:
 To enable hot properties loading, update the **helm chart values file** with
 
 ```
-
 operatorConfiguration:
   spark-operator.properties: |+
     spark.operator.dynamic.config.enabled=true
     # ... all other config overides...
   dynamicConfig:
     create: true
+```
+
+## Metrics
+
+Spark operator,
+following [Apache Spark](https://spark.apache.org/docs/latest/monitoring.html#metrics),
+has a configurable metrics system based on
+the [Dropwizard Metrics Library](https://metrics.dropwizard.io/4.2.25/). Note that Spark Operator
+does not have Spark UI, MetricsServlet
+and PrometheusServlet from org.apache.spark.metrics.sink package are not supported. If you are
+interested in Prometheus metrics exporting, please take a look at below
+section [Forward Metrics to Prometheus](#Forward-Metrics-to-Prometheus)
+
+### JVM Metrics
+
+Spark Operator collects JVM metrics
+via [Codahale JVM Metrics](https://javadoc.io/doc/com.codahale.metrics/metrics-jvm/latest/index.html)
+
+- BufferPoolMetricSet
+- FileDescriptorRatioGauge
+- GarbageCollectorMetricSet
+- MemoryUsageGaugeSet
+- ThreadStatesGaugeSet
 
+### Kubernetes Client Metrics
+
+| Metrics Name                                              | Type       | Description                                                                                                              |
+|-----------------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
+| kubernetes.client.http.request                            | Meter      | Tracking the rates of HTTP request sent to the Kubernetes API Server                                                     |
+| kubernetes.client.http.response                           | Meter      | Tracking the rates of HTTP response from the Kubernetes API Server                                                       |
+| kubernetes.client.http.response.failed                    | Meter      | Tracking the rates of HTTP requests which have no response from the Kubernetes API Server                                |
+| kubernetes.client.http.response.latency.nanos             | Histograms | Measures the statistical distribution of HTTP response latency from the Kubernetes API Server                            |
+| kubernetes.client.http.response.<ResponseCode>            | Meter      | Tracking the rates of HTTP response based on response code from the Kubernetes API Server                                |
+| kubernetes.client.http.request.<RequestMethod>            | Meter      | Tracking the rates of HTTP request based type of method to the Kubernetes API Server                                     |
+| kubernetes.client.http.response.1xx                       | Meter      | Tracking the rates of HTTP Code 1xx responses (informational) received from the Kubernetes API Server per response code. |
+| kubernetes.client.http.response.2xx                       | Meter      | Tracking the rates of HTTP Code 2xx responses (success) received from the Kubernetes API Server per response code.       |
+| kubernetes.client.http.response.3xx                       | Meter      | Tracking the rates of HTTP Code 3xx responses (redirection) received from the Kubernetes API Server per response code.   |
+| kubernetes.client.http.response.4xx                       | Meter      | Tracking the rates of HTTP Code 4xx responses (client error) received from the Kubernetes API Server per response code.  |
+| kubernetes.client.http.response.5xx                       | Meter      | Tracking the rates of HTTP Code 5xx responses (server error) received from the Kubernetes API Server per response code.  |
+| kubernetes.client.<ResourceName>.<Method>                 | Meter      | Tracking the rates of HTTP request for a combination of one Kubernetes resource and one http method                      |
+| kubernetes.client.<NamespaceName>.<ResourceName>.<Method> | Meter      | Tracking the rates of HTTP request for a combination of one namespace-scoped Kubernetes resource and one http method     |
+
+### Forward Metrics to Prometheus
+
+In this section, we will show you how to forward Spark Operator metrics
+to [Prometheus](https://prometheus.io).
+
+* Modify the metrics properties section in the file
+  `build-tools/helm/spark-kubernetes-operator/values.yaml`:
+
+```properties
+metrics.properties:|+
+  spark.metrics.conf.operator.sink.prometheus.class=org.apache.spark.kubernetes.operator.metrics.
+sink.PrometheusPullModelSink
 ```
 
-## Config Metrics Publishing Behavior
+* Install Spark Operator
+
+```bash
+helm install spark-kubernetes-operator -f build-tools/helm/spark-kubernetes-operator/values.yaml build-tools/helm/spark-kubernetes-operator/
+```
+
+* Install Prometheus via Helm Chart
+
+```bash
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm install prometheus prometheus-community/prometheus
+```
+
+* Find and Annotate Spark Operator Pods
+
+```bash
+kubectl get pods -l app.kubernetes.io/name=spark-kubernetes-operator
+NAME                                         READY   STATUS    RESTARTS   AGE
+spark-kubernetes-operator-598cb5d569-bvvd2   1/1     Running   0          24m
+
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 prometheus.io/scrape=true
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 prometheus.io/path=/prometheus
+kubectl annotate pods spark-kubernetes-operator-598cb5d569-bvvd2 prometheus.io/port=19090
+```
+
+* Check Metrics via Prometheus UI
+
+```bash
+kubectl get pods | grep "prometheus-server"
+prometheus-server-654bc74fc9-8hgkb                   2/2     Running   0          59m
+
+kubectl port-forward --address 0.0.0.0 pod/prometheus-server-654bc74fc9-8hgkb  8080:9090
+```
 
-Spark Operator uses the same source & sink interface as Apache Spark. You may
-use existing Spark metrics sink for both applications and the operator.
+open your browser with address `localhost:8080`. Click on Status Targets tab, you should be able
+to find target as below.
+[<img src="resources/prometheus.png">](resources/prometheus.png)