Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/dev/monitoring-setup.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# UDS Core Metrics Scraping Setup

<details open>
<summary>Istio Ambient Mode</summary>

## Unified Traffic Handling:
Ambient mode routes all workload traffic through ztunnel, which uses a common HBONE-based port (e.g. 15008), eliminating per‑workload sidecars.

## Simplified TLS Management:
No need for individual certificate mounts; TLS is centrally handled, and metrics endpoints can be scraped over HTTP (with operator mutations removing TLS config).

## Limited L7 Enforcement:
Since ztunnel does not process L7 attributes by default, HTTP headers and JWT validation must be handled via a waypoint (or bypassed), which differs from sidecar mode where the sidecar enforces these rules.

## Operator Mutation:
The Prometheus operator mutates ServiceMonitor/PodMonitor resources to remove TLS configuration and reset the scrapeClass when a target is meant to run in ambient mode.

</details>

<details>
<summary>Istio Sidecar Mode [LEGACY]</summary>

UDS Core leverages Pepr to handle setup of Prometheus scraping metrics endpoints, with the particular configuration necessary to work in a STRICT mTLS (Istio) environment. We handle this via a default scrapeClass in prometheus to add the istio certs. When a monitor needs to be exempt from that tlsConfig a mutation is performed to leverage a plain scrape class without istio certs.

> [!NOTE]
Expand Down Expand Up @@ -30,3 +50,5 @@ An alternative spec option would use the service name instead of selectors/port
### Generation of service + monitor

Another alternative approach would be to use a pod selector and port only. We would then generate both a service and servicemonitor, giving us full control of the port names and selectors. This seems like a viable path, but does add an extra resource for us to generate and manage. There could be unknown side effects of generating services that could clash with other services (particularly with istio endpoints). This would otherwise be a relative straightforward approach and is worth evaluating again if we want to simplify the spec later on.

</details>
1 change: 0 additions & 1 deletion docs/reference/configuration/UDS operator/package.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ The UDS Operator seamlessly enables the following enhancements and protections f
:::caution
Warning: **Istio Ambient Mode** Package support is in Alpha and may not be stable. Different workloads may experience issues when migrating away from sidecars so testing in a development/staging environment is encouraged. In addition there are some known limitations with ambient support at this time:
- `Package` CRs with AuthService SSO clients (`enableAuthserviceSelector`) are not supported in ambient mode. This is a limitation we plan to remove with operator support/configuration of [waypoint proxies](https://istio.io/latest/docs/ambient/usage/waypoint/), track progress on [this issue in GitHub](https://github.com/defenseunicorns/uds-core/issues/1200).
- Metrics of applications in ambient mode will _NOT_ be scraped successfully by Prometheus when STRICT mTLS is being enforced (there may be workarounds to scrape in PERMISSIVE mode but this is not advised). This will be resolved once Prometheus is migrated to ambient mode.
:::

### Example UDS Package CR
Expand Down
4 changes: 0 additions & 4 deletions docs/reference/configuration/uds-monitoring-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,6 @@ spec:
type: "Bearer"
```

Due to UDS Core using STRICT Istio mTLS across the cluster, Prometheus is also configured by default to manage properly scraping metrics with STRICT mTLS. This is done primarily by leveraging a default [`scrapeClass`](https://github.com/prometheus-operator/prometheus-operator/blob/v0.75.1/Documentation/api.md#monitoring.coreos.com/v1.ScrapeClass) which provides the correct TLS configuration and certificates to make mTLS connections. The default configuration works in most scenarios since the operator will attempt to auto-detect needs based istio-injection status in each namespace. If this configuration does not work (the main place this may be an issue is metrics being exposed on a PERMISSIVE mTLS port) there are two options for manually opt-ing out of the Istio TLS configuration:
1. Individual monitors can explicitly set the `exempt` scrape class to opt out of the Istio certificate configuration.
1. If setting a `scrapeClass` is not an option due to lack of configuration in a helm chart, or for other reasons, monitors can set the `uds/skip-mutate` annotation (with any value) to have Pepr mutate the `exempt` scrape class onto the monitor.

## Adding Dashboards

Grafana within UDS Core is configured with [a sidecar](https://github.com/grafana/helm-charts/blob/6eecb003569dc41a494d21893b8ecb3e8a9741a0/charts/grafana/values.yaml#L926-L928) that will watch for new dashboards added via configmaps or secrets and load them into Grafana dynamically. In order to have your dashboard added the configmap or secret must be labelled with `grafana_dashboard: "1"`, which is used by the sidecar to watch and collect new dashboards.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ function authserviceAuthorizationPolicy(
notValues: ["*"],
},
],
to: [
{
operation: {
notPorts: ["15020"],
notPaths: ["/stats/prometheus"],
},
},
],
},
],
selector: {
Expand Down Expand Up @@ -81,6 +89,14 @@ function jwtAuthZAuthorizationPolicy(
},
},
],
to: [
{
operation: {
notPorts: ["15020"],
notPaths: ["/stats/prometheus"],
},
},
],
},
],
},
Expand Down
105 changes: 91 additions & 14 deletions src/pepr/operator/controllers/network/authorizationPolicies.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import { Direction, Gateway, RemoteGenerated, UDSPackage } from "../../crd";
import { Action, AuthorizationPolicy } from "../../crd/generated/istio/authorizationpolicy-v1beta1";
import { IstioState } from "../istio/namespace";
import { generateAuthorizationPolicies } from "./authorizationPolicies";

jest.mock("../../../logger", () => ({
Expand Down Expand Up @@ -58,7 +59,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "test-ns");
const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient);
expect(policies.length).toBe(1);
const policy = policies[0];
expect(policy.metadata?.name).toBe(
Expand All @@ -85,7 +86,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "test-ns");
const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient);
expect(policies.length).toBe(1);
const policy = policies[0];
expect(policy.metadata?.name).toBe("protect-kubeapi-test-ingress-kubeapi-test-kubeapi");
Expand All @@ -110,7 +111,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "test-ns");
const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient);
expect(policies.length).toBe(1);
const policy = policies[0];
expect(policy.metadata?.name).toBe("protect-kubenodes-test-ingress-kubenodes-test-kubenodes");
Expand Down Expand Up @@ -138,6 +139,7 @@ describe("authorization policy generation", () => {
const policies: AuthorizationPolicy[] = await generateAuthorizationPolicies(
pkg,
"curl-ns-remote-cidr",
IstioState.Ambient,
);
expect(policies).toHaveLength(1);
const policy = policies[0];
Expand Down Expand Up @@ -201,7 +203,11 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "authservice-test-app");
const policies = await generateAuthorizationPolicies(
pkg,
"authservice-test-app",
IstioState.Ambient,
);
// We expect exactly two policies: one for the expose rule and one for the allow rule.
expect(policies.length).toBe(2);

Expand Down Expand Up @@ -256,7 +262,11 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "test-tenant-app");
const policies = await generateAuthorizationPolicies(
pkg,
"test-tenant-app",
IstioState.Ambient,
);
expect(policies.length).toBe(2);
const names = policies.map(p => p.metadata?.name);
expect(new Set(names).size).toBe(2);
Expand All @@ -280,7 +290,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "loki");
const policies = await generateAuthorizationPolicies(pkg, "loki", IstioState.Ambient);
// With one allow rule (Ingress/IntraNamespace), expect one policy
expect(policies.length).toBe(1);
const policy = policies[0];
Expand Down Expand Up @@ -322,7 +332,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "neuvector");
const policies = await generateAuthorizationPolicies(pkg, "neuvector", IstioState.Ambient);
// With the current per-rule design we expect three policies
expect(policies.length).toBe(3);

Expand Down Expand Up @@ -394,7 +404,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "vector");
const policies = await generateAuthorizationPolicies(pkg, "vector", IstioState.Ambient);
expect(policies.length).toBe(1);
const policy = policies[0];
expect(policy.metadata?.name).toBe("protect-vector-ingress-prometheus-metrics");
Expand Down Expand Up @@ -429,7 +439,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "velero");
const policies = await generateAuthorizationPolicies(pkg, "velero", IstioState.Ambient);
// Expect one policy
expect(policies.length).toBe(1);
const policy = policies[0];
Expand All @@ -447,7 +457,7 @@ describe("authorization policy generation", () => {
);
});

test("should generate correct AuthorizationPolicies for Authservice", async () => {
test("should generate correct AuthorizationPolicies for Ambient Authservice", async () => {
const pkg: UDSPackage = {
metadata: { name: "authservice", namespace: "authservice", generation: 1 },
spec: {
Expand All @@ -467,7 +477,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "authservice");
const policies = await generateAuthorizationPolicies(pkg, "authservice", IstioState.Ambient);
// Expect two policies
expect(policies.length).toBe(2);
const nsPolicy = policies.find(
Expand All @@ -493,6 +503,73 @@ describe("authorization policy generation", () => {
);
});

test("should generate correct AuthorizationPolicies for Sidecar Authservice", async () => {
const pkg: UDSPackage = {
metadata: { name: "authservice", namespace: "authservice", generation: 1 },
spec: {
network: {
allow: [
{ direction: Direction.Ingress, remoteGenerated: RemoteGenerated.IntraNamespace },
{ direction: Direction.Egress, remoteGenerated: RemoteGenerated.IntraNamespace },
{
direction: Direction.Ingress,
selector: { "app.kubernetes.io/name": "authservice" },
remoteNamespace: "",
port: 10003,
description: "Protected Apps",
},
],
},
},
};

const policies = await generateAuthorizationPolicies(pkg, "authservice", IstioState.Sidecar);
// Expect three policies
expect(policies.length).toBe(3);
const nsPolicy = policies.find(
p => p.metadata?.name === "protect-authservice-ingress-all-pods-intranamespace",
);
expect(nsPolicy).toBeDefined();
expect(nsPolicy!.metadata?.namespace).toBe("authservice");
expect(nsPolicy!.spec?.action).toBe(Action.Allow);
expect(nsPolicy!.spec?.rules).toEqual(
expect.arrayContaining([{ from: [{ source: { namespaces: ["authservice"] } }] }]),
);

const workloadPolicy = policies.find(
p => p.metadata?.name === "protect-authservice-ingress-protected-apps",
);
expect(workloadPolicy).toBeDefined();
expect(workloadPolicy!.spec?.selector?.matchLabels).toEqual({
"app.kubernetes.io/name": "authservice",
});
expect(workloadPolicy!.spec?.action).toBe(Action.Allow);
expect(workloadPolicy!.spec?.rules).toEqual(
expect.arrayContaining([{ to: [{ operation: { ports: ["10003"] } }] }]),
);

const metricScrapingPolicy = policies.find(
p => p.metadata?.name === "protect-authservice-ingress-15020-sidecar-metric-scraping",
);
expect(metricScrapingPolicy).toBeDefined();
expect(metricScrapingPolicy!.spec?.selector?.matchLabels).toEqual({});
expect(metricScrapingPolicy!.spec?.action).toBe(Action.Allow);
expect(metricScrapingPolicy!.spec?.rules).toEqual(
expect.arrayContaining([
{
from: [
{
source: {
principals: ["cluster.local/ns/monitoring/sa/kube-prometheus-stack-prometheus"],
},
},
],
to: [{ operation: { ports: ["15020"] } }],
},
]),
);
});

test("should generate correct AuthorizationPolicies for Prometheus-Stack", async () => {
const pkg: UDSPackage = {
metadata: { name: "prometheus-stack", namespace: "monitoring", generation: 1 },
Expand All @@ -519,7 +596,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "monitoring");
const policies = await generateAuthorizationPolicies(pkg, "monitoring", IstioState.Ambient);
// Expect three policies
expect(policies.length).toBe(3);
const nsPolicy = policies.find(
Expand Down Expand Up @@ -597,7 +674,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "grafana");
const policies = await generateAuthorizationPolicies(pkg, "grafana", IstioState.Ambient);
// Expect three policies: one from expose, one from allow, and one monitor policy
expect(policies.length).toBe(3);
const exposePolicy = policies.find(
Expand Down Expand Up @@ -763,7 +840,7 @@ describe("authorization policy generation", () => {
},
};

const policies = await generateAuthorizationPolicies(pkg, "keycloak");
const policies = await generateAuthorizationPolicies(pkg, "keycloak", IstioState.Ambient);
// We expect 6 policies
expect(policies.length).toBe(6);

Expand Down
22 changes: 22 additions & 0 deletions src/pepr/operator/controllers/network/authorizationPolicies.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import {
Rule,
Source,
} from "../../crd/generated/istio/authorizationpolicy-v1beta1";
import { IstioState } from "../istio/namespace";
import { getOwnerRef, purgeOrphans, sanitizeResourceName } from "../utils";
import { META_IP } from "./generators/cloudMetadata";
import { kubeAPI } from "./generators/kubeAPI";
Expand Down Expand Up @@ -206,6 +207,7 @@ function buildAuthPolicy(
export async function generateAuthorizationPolicies(
pkg: UDSPackage,
pkgNamespace: string,
istioMode: string,
): Promise<AuthorizationPolicy[]> {
const pkgName = pkg.metadata?.name ?? "unknown";
const generation = pkg.metadata?.generation?.toString() ?? "0";
Expand Down Expand Up @@ -261,6 +263,26 @@ export async function generateAuthorizationPolicies(
}
}

// With Prometheus in Ambient mode, all traffic is sent over mTLS and the
// destination sidecar requires an ALLOW policy to expose sidecar metrics.
// Add an AuthorizationPolicy to allow all traffic on port 15020 for the package's namespace.
if (istioMode === IstioState.Sidecar) {
const extraPolicyName = sanitizeResourceName(
`protect-${pkgName}-ingress-15020-sidecar-metric-scraping`,
);
const extraPolicy = buildAuthPolicy(
extraPolicyName,
pkg,
{}, // empty selector to apply to all workloads in the namespace
{ principals: [PROMETHEUS_PRINCIPAL] },
["15020"],
);
policies.push(extraPolicy);
log.trace(
`Generated extra ambient allow authpol for port 15020: ${extraPolicy.metadata?.name}`,
);
}

// Apply policies concurrently.
for (const policy of policies) {
try {
Expand Down
2 changes: 1 addition & 1 deletion src/pepr/operator/reconcilers/package-reconciler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ export async function packageReconciler(pkg: UDSPackage) {
// Pass the effective Istio mode to the networkPolicies function
const netPol = await networkPolicies(pkg, namespace!, istioMode);

const authPol = await generateAuthorizationPolicies(pkg, namespace!);
const authPol = await generateAuthorizationPolicies(pkg, namespace!, istioMode);

let endpoints: string[] = [];
// Update the namespace to enable the expected Istio mode (sidecar or ambient)
Expand Down
Loading