Skip to content

Commit

Permalink
[website] update docsy + added version monitoring docs (#134)
Browse files Browse the repository at this point in the history
CHANGES:
- updated `hugo` modules
- enabled dark mode support (from `docsy`)
- added page related to version monitoring
- generated relevat API Reference docs and Helm chart docs

---------

Co-authored-by: Pavan <[email protected]>
  • Loading branch information
skrishnan-sap and Pavan-SAP authored Sep 20, 2024
1 parent 25b7cdd commit 3d1648d
Show file tree
Hide file tree
Showing 17 changed files with 2,385 additions and 179 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish-website.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
- name: Setup Hugo
uses: peaceiris/actions-hugo@v3
with:
hugo-version: "0.133.0"
hugo-version: "0.134.0"
extended: true
- name: Setup Node
uses: actions/setup-node@v4
Expand Down
1 change: 1 addition & 0 deletions website/assets/scss/_styles_project.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@import 'td/code-dark';
4 changes: 2 additions & 2 deletions website/content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ description: A Kubernetes operator for managing the lifecycle of multi-tenant SA
{{% blocks/cover title="Welcome to CAP Operator" image_anchor="top" height="full" color="primary" %}}
<div class="mx-auto">
<span class="font-weight-bold">A Kubernetes operator for managing the lifecycle of multi-tenant CAP applications</span><br><br><br>
<a class="btn btn-lg btn-primary me-3 mb-4" href="docs/">
<a class="btn btn-lg btn-outline-light me-3 mb-4 rounded-pill" href="docs/">
Learn more <i class="fas fa-arrow-alt-circle-right ms-2"></i>
</a>
<a class="btn btn-lg btn-secondary me-3 mb-4" href="https://github.com/sap/cap-operator">
<a class="btn btn-lg btn-outline-light me-3 mb-4 rounded-pill" href="https://github.com/sap/cap-operator">
Go to the source repository <i class="fab fa-github ms-2 "></i>
</a>
<br><br><br><p class="lead mt-5">
Expand Down
3 changes: 3 additions & 0 deletions website/content/en/docs/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ Here's a list of environment variables used by CAP Operator.
- `DNS_MANAGER`: specifies the external DNS manager to be used. Possible values are:
- `gardener`: ["Gardener" external DNS manager](https://github.com/gardener/external-dns-management)
- `kubernetes`: [external DNS management from Kubernetes](https://github.com/kubernetes-sigs/external-dns)
- `PROMETHEUS_ADDRESS`: URL of the Prometheus server (or service) for executing PromQL queries e.g. `http://prometheus-operated.monitoring.svc.cluster.local:9090`. If no URL is supplied, the controller will not start the version monitoring function.
- `PROM_ACQUIRE_CLIENT_RETRY_DELAY`: Time delay between retries when a Prometheus client creation and connection check fails.
- `METRICS_EVAL_INTERVAL`: Time interval between subsequent iterations where outdated versions are identified and queued for evaluation.
2 changes: 1 addition & 1 deletion website/content/en/docs/usage/resources/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Resources"
linkTitle: "Resources"
weight: 50
weight: 60
type: "docs"
description: >
Detailed configuration of resources managed by CAP Operator
Expand Down
18 changes: 18 additions & 0 deletions website/content/en/docs/usage/resources/capapplicationversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ deploymentDefinition:
routerDestinationName: cap-server-url
- name: tech-port
port: 4005
monitoring:
scrapeConfig:
port: tech--port
deletionRules:
expression: scalar(sum(avg_over_time(current_sessions{job="cav-cap-app-v1-cap-backend-svc",namespace="cap-ns"}[2h]))) <= bool 5
```

The `type` of the deployment is important to indicate how the operator handles this workload (for example, injection of `destinations` to be used by the approuter). Valid values are:
Expand All @@ -85,6 +90,14 @@ The port configurations aren't mandatory and can be omitted. This would mean tha

> NOTE: If multiple ports are configured for a workload of type `Router`, the first available port will be used to target external traffic to the application domain.

#### Monitoring configuration

For each _workload of type deployment_ in a `CAPApplicationVersion`, it is possible to define:
1. Deletion rules: A criteria based on metrics which when satisfied signifies that the workload can be removed
2. Scrape configuration: Configuration which defines how metrics are scraped from the workload service.

Details of how to configure workload monitoring can be found [here](../version-monitoring.md#configure-capapplicationversion).

### Workloads with `jobDefinition`

```yaml
Expand Down Expand Up @@ -211,6 +224,11 @@ spec:
- name: tech-port
port: 4005
appProtocol: grpc
monitoring:
scrapeConfig:
port: tech--port
deletionRules:
expression: scalar(sum(avg_over_time(current_sessions{job="cav-cap-app-v1-cap-backend-svc",namespace="cap-ns"}[2h]))) <= bool 5
livenessProbe:
failureThreshold: 3
httpGet:
Expand Down
151 changes: 151 additions & 0 deletions website/content/en/docs/usage/version-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
title: "Version Monitoring"
linkTitle: "Version Monitoring"
weight: 50
type: "docs"
description: >
How to monitor versions for automatic cleanup
---

In a continuous delivery environment where newer applications versions may be deployed frequently, monitoring and cleaning up older unused versions becomes important to conserve cluster resources (compute, memory, storage etc.) and operate a clutter free system. The CAP Operator now provides application developers and operations teams to define how an application version can be monitored for usage.

## Integration with Prometheus

[Prometheus](https://prometheus.io/) is the industry standard for monitoring application metrics and provides a wide variety of tools for managing and reporting metrics data. The CAP Operator (controller) can be connected to a [Prometheus](https://prometheus.io/) server by setting the `PROMETHEUS_ADDRESS` environment variable on the controller (see [Configuration](../configuration/_index.md)). The controller is then able to query application related metrics based on the workload specification of `CAPApplicationVersions`. If no Prometheus address is supplied, the version monitoring function of the controller is not started.

## Configure `CAPApplication`

To avoid incompatible changes, version cleanup monitoring must be enabled for CAP application using the annotation `sme.sap.com/enable-cleanup-monitoring`. The annotation can have the following values which affects the version cleanup behavior:

|Value|Behavior|
|--|--|
|`dry-run`|When a `CAPApplicationVersion` is evaluated to be eligible for cleanup, an event of type `ReadyForDeletion` is emitted without performing the actual deletion of the version.|
|`true`|When a `CAPApplicationVersion` is evaluated to be eligible for cleanup, the version is deleted and an event of type `ReadyForDeletion` is emitted.|

## Configure `CAPApplicationVersion`

For each _workload of type deployment_ in a `CAPApplicationVersion`, it is possible to define:
1. Deletion rules: A criteria based on metrics which when satisfied signifies that the workload can be removed
2. Scrape configuration: Configuration which defines how metrics are scraped from the workload service.

#### Deletion Rules (Variant 1) based on Metric Type

The following example shows how a workload, named `backend`, is configured with deletion rules based on multiple metrics.
```yaml
apiVersion: sme.sap.com/v1alpha1
kind: CAPApplicationVersion
metadata:
namespace: demo
name: cav-demo-app-1
spec:
workloads:
- name: backend
deploymentDefinition:
monitoring:
deletionRules:
metrics:
- calculationPeriod: 90m
name: current_sessions
thresholdValue: "0"
type: Gauge
- calculationPeriod: 2h
name: total_http_requests
thresholdValue: "0.00005"
type: Counter
```
This informs the CAP Operator that workload `backend` is supplying two metrics which can be monitored for usage.

- Metric `current_sessions` is of type `Gauge` which indicates that it is an absolute value at any point of time. When evaluating this metric, the CAP operator queries Prometheus with a PromQL expression which calculates the average value of this metric over a specified calculation period. The average value from each time series is then added together to get the evaluated value. The evaluated value is then compared against the specified threshold value to determine usage (or eligibility for cleanup).

|Evaluation steps for metric type `Gauge`|
|-|
|Execute PromQL expression `sum(avg_over_time(current_sessions{job="cav-demo-app-1-backend-svc",namespace="demo"}[90m]))` to get the evaluated value|
|Check whether evaluated value <= 0 (the specified `thresholdValue`)|

- Similarly, metric `total_http_requests` is of type `Counter` which indicates that it is a cumulative value which can increment. When evaluating this metric, the CAP operator queries Prometheus with a PromQL expression which calculates the rate (of increase) of this metric over a specified calculation period. The rate of increase from each time series is then added together to get the evaluated value. The evaluated value is then compared against the specified threshold value to determine usage (or eligibility for cleanup).

|Evaluation steps for metric type `Counter`|
|-|
|Execute PromQL expression `sum(rate(total_http_requests{job="cav-demo-app-1-backend-svc",namespace="demo"}[2h]))` to get the evaluated value|
|Check whether evaluated value <= 0.00005 (the specified `thresholdValue`)|

{{% alert title="Prometheus Metrics Data" color="light" %}}
- Prometheus stores metric data as multiple time series by label set. The number of time series created from a single metric depends on the possible combination of labels. The label `job` represents the source of the metric and (within Kubernetes) is the service representing the workload.
- CAP Operator does not support Prometheus metric types other than `Gauge` and `Counter`. Lean more about metric types [here](https://prometheus.io/docs/concepts/metric_types/).
{{% /alert %}}

All specified metrics of a workload must satisfy the evaluation criteria for the workload to be eligible for cleanup.

#### Deletion Rules (Variant 2) as PromQL expression

Another way to specify the deletion criteria for a workload is by providing a PromQL expression which results a boolean scalar.
```yaml
apiVersion: sme.sap.com/v1alpha1
kind: CAPApplicationVersion
metadata:
namespace: demo
name: cav-demo-app-1
spec:
workloads:
- name: backend
deploymentDefinition:
monitoring:
deletionRules:
expression: scalar(sum(avg_over_time(current_sessions{job="cav-demo-app-1-backend-svc",namespace="demo"}[2h]))) <= bool 5
```

The supplied PromQL expression is executed as a Prometheus query by the CAP Operator. The expected result is a scalar boolean (`0` or `1`). Users may use [comparison binary operators](https://prometheus.io/docs/prometheus/latest/querying/operators/#comparison-binary-operators) with the `bool` modifier to achieve the expected result. If the evaluation result is true (`1`), the workload is eligible for removal.

This variant can be useful when:
- the predefined evaluation based on metric types is not enough for determining usage of a workload.
- custom metrics scraping configurations are employed where the `job` label in the collected time series data does not mach the name of the (Kubernetes) Service created for the workload.

### Scrape Configuration

[Prometheus Operator](https://prometheus-operator.dev/docs/getting-started/introduction/) is a popular Kubernetes operator for managing Prometheus and related monitoring components. A common way to setup scrape targets for a Prometheus instance is by creating the [`ServiceMonitor`](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.ServiceMonitor) resource which specifies which `Services` (and ports) that should be scraped for collecting application metrics.

{{% alert title="Prerequisite" color="info" %}}
The `scrapeConfig` feature of a workload is usable only when the [`ServiceMonitor`](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.ServiceMonitor) Custom Resource is available on the Kubernetes cluster.
{{% /alert %}}

The CAP Operator provides an easy way to create `Service Monitors` which target the `Services` created for version workloads. The following sample shows how to configure this.
```yaml
kind: CAPApplicationVersion
metadata:
namespace: demo
name: cav-demo-app-1
spec:
workloads:
- name: backend
deploymentDefinition:
ports:
- appProtocol: http
name: metrics-port
networkPolicy: Cluster
port: 9000
monitoring:
deletionRules:
expression: scalar(sum(avg_over_time(current_sessions{job="cav-demo-app-1-backend-svc",namespace="demo"}[2h]))) <= bool 5
scrapeConfig:
interval: 15s
path: /metrics
port: metrics-port
```

With this configuration the CAP Operator will create a `ServiceMonitor` which targets the workload `Service`. The `scrapeConfig.port` should match the name of one of the ports specified on the workload.

{{% alert title="Use Case" color="secondary" %}}
The workload `scrapeConfig` aims to support a minimal configuration, creating a `ServiceMonitor` which supports the most common use case (i.e. scraping the workload service via. a defined workload port). To use complex configurations in `ServiceMonitors`, they should be created separately. If the `scrapeConfig` of a version workload is empty, the CAP Operator will not attempt to create the related `ServiceMonitor`.
{{% /alert %}}

## Evaluating `CAPApplicationVersions` for cleanup

At specified intervals (dictated by controller environment variable `METRICS_EVAL_INTERVAL`), the CAP Operator selects versions which are candidates for evaluation.
- Only versions for `CAPApplications` where annotation `sme.sap.com/enable-cleanup-monitoring` is set are considered.
- All versions (`spec.version`) higher than the highest version with `Ready` status are not considered for evaluation. If there is no version with status `Ready`, no versions are considered.
- All versions linked to a `CAPTenant` are excluded from evaluation. This includes versions where the following fields of a `CAPTenant` point to the version:
- `status.currentCAPApplicationVersionInstance` - current version of the tenant.
- `spec.version` - the version to which a tenant is upgrading.

Workloads from the identified versions are then evaluated based on the defined `deletionRules`. Workloads without `deletionRules` are automatically eligible for cleanup. All workloads (with type deployment) of a version must satisfy the evaluation criteria for the version to be deleted.

2 changes: 1 addition & 1 deletion website/go.mod
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
module github.com/sap/cap-operator/website

go 1.23.0
go 1.23.1

require github.com/google/docsy v0.10.0 // indirect
19 changes: 13 additions & 6 deletions website/hugo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ enableRobotsTXT: true
# Will give values to .Lastmod etc.
enableGitInfo: true

pygmentsCodeFences: true
pygmentsUseClasses: false
pygmentsUseClassic: false
pygmentsStyle: tango

# Language settings
contentDir: "content/en"
defaultContentLanguage: "en"
Expand Down Expand Up @@ -41,7 +46,7 @@ permalinks:
imaging:
resampleFilter: "CatmullRom"
quality: 90
anchor: "Smart"
anchor: Smart

# Language configuration
languages:
Expand All @@ -56,14 +61,13 @@ languages:

markup:
goldmark:
parser:
attribute:
block: true
renderer:
unsafe: true
highlight:
# See a complete list of available styles at https://xyproto.github.io/splash/docs/all.html
style: "nord"
# Uncomment if you want your chosen highlight style used for code blocks without a specified language
guessSyntax: true
codeFences: true
noClasses: false # Required for dark-mode

# Everything below this are Site Params

Expand Down Expand Up @@ -112,8 +116,11 @@ params:
# Enable Lunr.js offline search
offlineSearch: true

prism_syntax_highlighting: false

# User interface configuration
ui:
showLightDarkModeMenu: true
# Set to true to disable breadcrumb navigation.
breadcrumb_disable: false
# Set to false to disable the About link in the site footer
Expand Down
Loading

0 comments on commit 3d1648d

Please sign in to comment.