diff --git a/docs/dev/monitoring-setup.md b/docs/dev/monitoring-setup.md index e4e79939b4..9de60dbbf1 100644 --- a/docs/dev/monitoring-setup.md +++ b/docs/dev/monitoring-setup.md @@ -1,5 +1,25 @@ # UDS Core Metrics Scraping Setup +
+Istio Ambient Mode + +## Unified Traffic Handling: +Ambient mode routes all workload traffic through ztunnel, which uses a common HBONE-based port (e.g. 15008), eliminating per‑workload sidecars. + +## Simplified TLS Management: +No need for individual certificate mounts; TLS is centrally handled, and metrics endpoints can be scraped over HTTP (with operator mutations removing TLS config). + +## Limited L7 Enforcement: +Since ztunnel does not process L7 attributes by default, HTTP headers and JWT validation must be handled via a waypoint (or bypassed), which differs from sidecar mode where the sidecar enforces these rules. + +## Operator Mutation: +The Prometheus operator mutates ServiceMonitor/PodMonitor resources to remove TLS configuration and reset the scrapeClass when a target is meant to run in ambient mode. + +
+ +
+Istio Sidecar Mode [LEGACY] + UDS Core leverages Pepr to handle setup of Prometheus scraping metrics endpoints, with the particular configuration necessary to work in a STRICT mTLS (Istio) environment. We handle this via a default scrapeClass in prometheus to add the istio certs. When a monitor needs to be exempt from that tlsConfig a mutation is performed to leverage a plain scrape class without istio certs. > [!NOTE] @@ -30,3 +50,5 @@ An alternative spec option would use the service name instead of selectors/port ### Generation of service + monitor Another alternative approach would be to use a pod selector and port only. We would then generate both a service and servicemonitor, giving us full control of the port names and selectors. This seems like a viable path, but does add an extra resource for us to generate and manage. There could be unknown side effects of generating services that could clash with other services (particularly with istio endpoints). This would otherwise be a relative straightforward approach and is worth evaluating again if we want to simplify the spec later on. + +
diff --git a/docs/reference/configuration/UDS operator/package.md b/docs/reference/configuration/UDS operator/package.md index 6be3b9d6d7..c407afa0ae 100644 --- a/docs/reference/configuration/UDS operator/package.md +++ b/docs/reference/configuration/UDS operator/package.md @@ -33,7 +33,6 @@ The UDS Operator seamlessly enables the following enhancements and protections f :::caution Warning: **Istio Ambient Mode** Package support is in Alpha and may not be stable. Different workloads may experience issues when migrating away from sidecars so testing in a development/staging environment is encouraged. In addition there are some known limitations with ambient support at this time: - `Package` CRs with AuthService SSO clients (`enableAuthserviceSelector`) are not supported in ambient mode. This is a limitation we plan to remove with operator support/configuration of [waypoint proxies](https://istio.io/latest/docs/ambient/usage/waypoint/), track progress on [this issue in GitHub](https://github.com/defenseunicorns/uds-core/issues/1200). -- Metrics of applications in ambient mode will _NOT_ be scraped successfully by Prometheus when STRICT mTLS is being enforced (there may be workarounds to scrape in PERMISSIVE mode but this is not advised). This will be resolved once Prometheus is migrated to ambient mode. ::: ### Example UDS Package CR diff --git a/docs/reference/configuration/uds-monitoring-metrics.md b/docs/reference/configuration/uds-monitoring-metrics.md index 33197049da..b987be7a60 100644 --- a/docs/reference/configuration/uds-monitoring-metrics.md +++ b/docs/reference/configuration/uds-monitoring-metrics.md @@ -46,10 +46,6 @@ spec: type: "Bearer" ``` -Due to UDS Core using STRICT Istio mTLS across the cluster, Prometheus is also configured by default to manage properly scraping metrics with STRICT mTLS. This is done primarily by leveraging a default [`scrapeClass`](https://github.com/prometheus-operator/prometheus-operator/blob/v0.75.1/Documentation/api.md#monitoring.coreos.com/v1.ScrapeClass) which provides the correct TLS configuration and certificates to make mTLS connections. The default configuration works in most scenarios since the operator will attempt to auto-detect needs based istio-injection status in each namespace. If this configuration does not work (the main place this may be an issue is metrics being exposed on a PERMISSIVE mTLS port) there are two options for manually opt-ing out of the Istio TLS configuration: -1. Individual monitors can explicitly set the `exempt` scrape class to opt out of the Istio certificate configuration. -1. If setting a `scrapeClass` is not an option due to lack of configuration in a helm chart, or for other reasons, monitors can set the `uds/skip-mutate` annotation (with any value) to have Pepr mutate the `exempt` scrape class onto the monitor. - ## Adding Dashboards Grafana within UDS Core is configured with [a sidecar](https://github.com/grafana/helm-charts/blob/6eecb003569dc41a494d21893b8ecb3e8a9741a0/charts/grafana/values.yaml#L926-L928) that will watch for new dashboards added via configmaps or secrets and load them into Grafana dynamically. In order to have your dashboard added the configmap or secret must be labelled with `grafana_dashboard: "1"`, which is used by the sidecar to watch and collect new dashboards. diff --git a/src/pepr/operator/controllers/keycloak/authservice/authorization-policy.ts b/src/pepr/operator/controllers/keycloak/authservice/authorization-policy.ts index 7d6e88f789..11b3426f53 100644 --- a/src/pepr/operator/controllers/keycloak/authservice/authorization-policy.ts +++ b/src/pepr/operator/controllers/keycloak/authservice/authorization-policy.ts @@ -47,6 +47,14 @@ function authserviceAuthorizationPolicy( notValues: ["*"], }, ], + to: [ + { + operation: { + notPorts: ["15020"], + notPaths: ["/stats/prometheus"], + }, + }, + ], }, ], selector: { @@ -81,6 +89,14 @@ function jwtAuthZAuthorizationPolicy( }, }, ], + to: [ + { + operation: { + notPorts: ["15020"], + notPaths: ["/stats/prometheus"], + }, + }, + ], }, ], }, diff --git a/src/pepr/operator/controllers/network/authorizationPolicies.spec.ts b/src/pepr/operator/controllers/network/authorizationPolicies.spec.ts index c22cc40fc6..775d15ae1d 100644 --- a/src/pepr/operator/controllers/network/authorizationPolicies.spec.ts +++ b/src/pepr/operator/controllers/network/authorizationPolicies.spec.ts @@ -5,6 +5,7 @@ import { Direction, Gateway, RemoteGenerated, UDSPackage } from "../../crd"; import { Action, AuthorizationPolicy } from "../../crd/generated/istio/authorizationpolicy-v1beta1"; +import { IstioState } from "../istio/namespace"; import { generateAuthorizationPolicies } from "./authorizationPolicies"; jest.mock("../../../logger", () => ({ @@ -58,7 +59,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "test-ns"); + const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient); expect(policies.length).toBe(1); const policy = policies[0]; expect(policy.metadata?.name).toBe( @@ -85,7 +86,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "test-ns"); + const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient); expect(policies.length).toBe(1); const policy = policies[0]; expect(policy.metadata?.name).toBe("protect-kubeapi-test-ingress-kubeapi-test-kubeapi"); @@ -110,7 +111,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "test-ns"); + const policies = await generateAuthorizationPolicies(pkg, "test-ns", IstioState.Ambient); expect(policies.length).toBe(1); const policy = policies[0]; expect(policy.metadata?.name).toBe("protect-kubenodes-test-ingress-kubenodes-test-kubenodes"); @@ -138,6 +139,7 @@ describe("authorization policy generation", () => { const policies: AuthorizationPolicy[] = await generateAuthorizationPolicies( pkg, "curl-ns-remote-cidr", + IstioState.Ambient, ); expect(policies).toHaveLength(1); const policy = policies[0]; @@ -201,7 +203,11 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "authservice-test-app"); + const policies = await generateAuthorizationPolicies( + pkg, + "authservice-test-app", + IstioState.Ambient, + ); // We expect exactly two policies: one for the expose rule and one for the allow rule. expect(policies.length).toBe(2); @@ -256,7 +262,11 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "test-tenant-app"); + const policies = await generateAuthorizationPolicies( + pkg, + "test-tenant-app", + IstioState.Ambient, + ); expect(policies.length).toBe(2); const names = policies.map(p => p.metadata?.name); expect(new Set(names).size).toBe(2); @@ -280,7 +290,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "loki"); + const policies = await generateAuthorizationPolicies(pkg, "loki", IstioState.Ambient); // With one allow rule (Ingress/IntraNamespace), expect one policy expect(policies.length).toBe(1); const policy = policies[0]; @@ -322,7 +332,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "neuvector"); + const policies = await generateAuthorizationPolicies(pkg, "neuvector", IstioState.Ambient); // With the current per-rule design we expect three policies expect(policies.length).toBe(3); @@ -394,7 +404,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "vector"); + const policies = await generateAuthorizationPolicies(pkg, "vector", IstioState.Ambient); expect(policies.length).toBe(1); const policy = policies[0]; expect(policy.metadata?.name).toBe("protect-vector-ingress-prometheus-metrics"); @@ -429,7 +439,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "velero"); + const policies = await generateAuthorizationPolicies(pkg, "velero", IstioState.Ambient); // Expect one policy expect(policies.length).toBe(1); const policy = policies[0]; @@ -447,7 +457,7 @@ describe("authorization policy generation", () => { ); }); - test("should generate correct AuthorizationPolicies for Authservice", async () => { + test("should generate correct AuthorizationPolicies for Ambient Authservice", async () => { const pkg: UDSPackage = { metadata: { name: "authservice", namespace: "authservice", generation: 1 }, spec: { @@ -467,7 +477,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "authservice"); + const policies = await generateAuthorizationPolicies(pkg, "authservice", IstioState.Ambient); // Expect two policies expect(policies.length).toBe(2); const nsPolicy = policies.find( @@ -493,6 +503,73 @@ describe("authorization policy generation", () => { ); }); + test("should generate correct AuthorizationPolicies for Sidecar Authservice", async () => { + const pkg: UDSPackage = { + metadata: { name: "authservice", namespace: "authservice", generation: 1 }, + spec: { + network: { + allow: [ + { direction: Direction.Ingress, remoteGenerated: RemoteGenerated.IntraNamespace }, + { direction: Direction.Egress, remoteGenerated: RemoteGenerated.IntraNamespace }, + { + direction: Direction.Ingress, + selector: { "app.kubernetes.io/name": "authservice" }, + remoteNamespace: "", + port: 10003, + description: "Protected Apps", + }, + ], + }, + }, + }; + + const policies = await generateAuthorizationPolicies(pkg, "authservice", IstioState.Sidecar); + // Expect three policies + expect(policies.length).toBe(3); + const nsPolicy = policies.find( + p => p.metadata?.name === "protect-authservice-ingress-all-pods-intranamespace", + ); + expect(nsPolicy).toBeDefined(); + expect(nsPolicy!.metadata?.namespace).toBe("authservice"); + expect(nsPolicy!.spec?.action).toBe(Action.Allow); + expect(nsPolicy!.spec?.rules).toEqual( + expect.arrayContaining([{ from: [{ source: { namespaces: ["authservice"] } }] }]), + ); + + const workloadPolicy = policies.find( + p => p.metadata?.name === "protect-authservice-ingress-protected-apps", + ); + expect(workloadPolicy).toBeDefined(); + expect(workloadPolicy!.spec?.selector?.matchLabels).toEqual({ + "app.kubernetes.io/name": "authservice", + }); + expect(workloadPolicy!.spec?.action).toBe(Action.Allow); + expect(workloadPolicy!.spec?.rules).toEqual( + expect.arrayContaining([{ to: [{ operation: { ports: ["10003"] } }] }]), + ); + + const metricScrapingPolicy = policies.find( + p => p.metadata?.name === "protect-authservice-ingress-15020-sidecar-metric-scraping", + ); + expect(metricScrapingPolicy).toBeDefined(); + expect(metricScrapingPolicy!.spec?.selector?.matchLabels).toEqual({}); + expect(metricScrapingPolicy!.spec?.action).toBe(Action.Allow); + expect(metricScrapingPolicy!.spec?.rules).toEqual( + expect.arrayContaining([ + { + from: [ + { + source: { + principals: ["cluster.local/ns/monitoring/sa/kube-prometheus-stack-prometheus"], + }, + }, + ], + to: [{ operation: { ports: ["15020"] } }], + }, + ]), + ); + }); + test("should generate correct AuthorizationPolicies for Prometheus-Stack", async () => { const pkg: UDSPackage = { metadata: { name: "prometheus-stack", namespace: "monitoring", generation: 1 }, @@ -519,7 +596,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "monitoring"); + const policies = await generateAuthorizationPolicies(pkg, "monitoring", IstioState.Ambient); // Expect three policies expect(policies.length).toBe(3); const nsPolicy = policies.find( @@ -597,7 +674,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "grafana"); + const policies = await generateAuthorizationPolicies(pkg, "grafana", IstioState.Ambient); // Expect three policies: one from expose, one from allow, and one monitor policy expect(policies.length).toBe(3); const exposePolicy = policies.find( @@ -763,7 +840,7 @@ describe("authorization policy generation", () => { }, }; - const policies = await generateAuthorizationPolicies(pkg, "keycloak"); + const policies = await generateAuthorizationPolicies(pkg, "keycloak", IstioState.Ambient); // We expect 6 policies expect(policies.length).toBe(6); diff --git a/src/pepr/operator/controllers/network/authorizationPolicies.ts b/src/pepr/operator/controllers/network/authorizationPolicies.ts index 3b9e3f5bed..9385f6881f 100644 --- a/src/pepr/operator/controllers/network/authorizationPolicies.ts +++ b/src/pepr/operator/controllers/network/authorizationPolicies.ts @@ -12,6 +12,7 @@ import { Rule, Source, } from "../../crd/generated/istio/authorizationpolicy-v1beta1"; +import { IstioState } from "../istio/namespace"; import { getOwnerRef, purgeOrphans, sanitizeResourceName } from "../utils"; import { META_IP } from "./generators/cloudMetadata"; import { kubeAPI } from "./generators/kubeAPI"; @@ -206,6 +207,7 @@ function buildAuthPolicy( export async function generateAuthorizationPolicies( pkg: UDSPackage, pkgNamespace: string, + istioMode: string, ): Promise { const pkgName = pkg.metadata?.name ?? "unknown"; const generation = pkg.metadata?.generation?.toString() ?? "0"; @@ -261,6 +263,26 @@ export async function generateAuthorizationPolicies( } } + // With Prometheus in Ambient mode, all traffic is sent over mTLS and the + // destination sidecar requires an ALLOW policy to expose sidecar metrics. + // Add an AuthorizationPolicy to allow all traffic on port 15020 for the package's namespace. + if (istioMode === IstioState.Sidecar) { + const extraPolicyName = sanitizeResourceName( + `protect-${pkgName}-ingress-15020-sidecar-metric-scraping`, + ); + const extraPolicy = buildAuthPolicy( + extraPolicyName, + pkg, + {}, // empty selector to apply to all workloads in the namespace + { principals: [PROMETHEUS_PRINCIPAL] }, + ["15020"], + ); + policies.push(extraPolicy); + log.trace( + `Generated extra ambient allow authpol for port 15020: ${extraPolicy.metadata?.name}`, + ); + } + // Apply policies concurrently. for (const policy of policies) { try { diff --git a/src/pepr/operator/reconcilers/package-reconciler.ts b/src/pepr/operator/reconcilers/package-reconciler.ts index 157f0b57ee..1ff3d008bb 100644 --- a/src/pepr/operator/reconcilers/package-reconciler.ts +++ b/src/pepr/operator/reconcilers/package-reconciler.ts @@ -77,7 +77,7 @@ export async function packageReconciler(pkg: UDSPackage) { // Pass the effective Istio mode to the networkPolicies function const netPol = await networkPolicies(pkg, namespace!, istioMode); - const authPol = await generateAuthorizationPolicies(pkg, namespace!); + const authPol = await generateAuthorizationPolicies(pkg, namespace!, istioMode); let endpoints: string[] = []; // Update the namespace to enable the expected Istio mode (sidecar or ambient) diff --git a/src/pepr/prometheus/index.ts b/src/pepr/prometheus/index.ts index fe31c78b31..1d607ca7e6 100644 --- a/src/pepr/prometheus/index.ts +++ b/src/pepr/prometheus/index.ts @@ -14,7 +14,7 @@ import { ServiceMonitorScheme, } from "../operator/crd"; import { FallbackScrapeProtocol } from "../operator/crd/generated/prometheus/servicemonitor-v1"; -// configure subproject logger + const log = setupLogger(Component.PROMETHEUS); export const prometheus = new Capability({ @@ -25,114 +25,118 @@ export const prometheus = new Capability({ const { When } = prometheus; /** - * Mutate a service monitor to exclude it from mTLS metrics with `exempt` scrapeClass + * Returns true if any namespace selected has "istio-injection" enabled. + */ +async function isIstioInjected( + monitor: PrometheusServiceMonitor | PrometheusPodMonitor, +): Promise { + if (monitor.Raw.spec?.namespaceSelector?.any) return true; + const namespaces = monitor.Raw.spec?.namespaceSelector?.matchNames || [ + monitor.Raw.metadata?.namespace, + ] || ["default"]; + for (const ns of namespaces) { + const namespace = await K8s(kind.Namespace).Get(ns); + if (namespace.metadata?.labels?.["istio-injection"] === "enabled") { + return true; + } + } + return false; +} + +/** + * ServiceMonitor mutation logic: + * - If a custom scrapeClass is set (neither "istio-certs" nor "exempt"), update fallback only. + * - Else if skip conditions apply (skip annotations, not istio-injected, or scrapeClass is "exempt"), + * simply remove scrapeClass. + * - Otherwise (assumed "istio-certs"), remove scrapeClass, delete any TLS config, and set endpoints to HTTP. */ When(PrometheusServiceMonitor) .IsCreatedOrUpdated() .Mutate(async sm => { - if (sm.Raw.spec === undefined || sm.Raw.spec.scrapeClass !== undefined) { - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (sm.Raw.spec && !sm.Raw.spec.fallbackScrapeProtocol) { - sm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } + // Always set fallbackScrapeProtocol if missing. + if (!sm.Raw.spec!.fallbackScrapeProtocol) { + sm.Raw.spec!.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; + log.info(`Set fallbackScrapeProtocol for ServiceMonitor ${sm.Raw.metadata?.name}`); + } + + const sc = sm.Raw.spec!.scrapeClass; + if (sc !== undefined && sc !== "istio-certs" && sc !== "exempt") { + // Custom scrapeClass; do nothing else. + log.info( + `ServiceMonitor ${sm.Raw.metadata?.name} uses custom scrapeClass (${sc}); skipping endpoint mutation.`, + ); return; } - // Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace + // Skip conditions: skip annotations, not istio-injected, or already "exempt". if ( sm.Raw.metadata?.annotations?.["uds/skip-mutate"] || sm.Raw.metadata?.annotations?.["uds/skip-sm-mutate"] || - !(await isIstioInjected(sm)) + !(await isIstioInjected(sm)) || + sc === "exempt" ) { log.info( - `Mutating scrapeClass to exempt ServiceMonitor ${sm.Raw.metadata?.name} from default scrapeClass mTLS config`, + `ServiceMonitor ${sm.Raw.metadata?.name} meets skip conditions; clearing scrapeClass.`, ); - sm.Raw.spec.scrapeClass = "exempt"; - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (!sm.Raw.spec.fallbackScrapeProtocol) { - sm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } - + delete sm.Raw.spec!.scrapeClass; return; - } else { - log.info(`Patching service monitor ${sm.Raw.metadata?.name} for mTLS metrics`); - // Note: this tlsConfig patch is deprecated in favor of a default scrape class for both service and pod monitors - const tlsConfig = { - caFile: "/etc/prom-certs/root-cert.pem", - certFile: "/etc/prom-certs/cert-chain.pem", - keyFile: "/etc/prom-certs/key.pem", - insecureSkipVerify: true, - }; - const endpoints: ServiceMonitorEndpoint[] = sm.Raw.spec.endpoints || []; - endpoints.forEach(endpoint => { - endpoint.scheme = ServiceMonitorScheme.HTTPS; - endpoint.tlsConfig = tlsConfig; + } + + // Default case: presumed "istio-certs" + log.info( + `Patching ServiceMonitor ${sm.Raw.metadata?.name}: clearing scrapeClass, setting endpoints to HTTP, and removing TLS config.`, + ); + delete sm.Raw.spec!.scrapeClass; + if (sm.Raw.spec?.endpoints && Array.isArray(sm.Raw.spec.endpoints)) { + sm.Raw.spec.endpoints.forEach((endpoint: ServiceMonitorEndpoint) => { + endpoint.scheme = ServiceMonitorScheme.HTTP; + if (endpoint.tlsConfig) { + delete endpoint.tlsConfig; + } }); - sm.Raw.spec.endpoints = endpoints; - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (!sm.Raw.spec.fallbackScrapeProtocol) { - sm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } } }); /** - * Mutate a pod monitor to exclude it from mTLS metrics with `exempt` scrapeClass + * PodMonitor mutation logic: + * - If a custom scrapeClass is set (not "istio-certs" or "exempt"), update fallback only. + * - Else if skip conditions apply (skip annotations, not istio-injected, or scrapeClass is "exempt"), + * remove scrapeClass. + * - Otherwise, remove scrapeClass, delete TLS config from podMetricsEndpoints, and set endpoints to HTTP. */ When(PrometheusPodMonitor) .IsCreatedOrUpdated() .Mutate(async pm => { - if (pm.Raw.spec === undefined || pm.Raw.spec.scrapeClass !== undefined) { - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (pm.Raw.spec && !pm.Raw.spec.fallbackScrapeProtocol) { - pm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } - return; + if (!pm.Raw.spec!.fallbackScrapeProtocol) { + pm.Raw.spec!.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; + log.info(`Set fallbackScrapeProtocol for PodMonitor ${pm.Raw.metadata?.name}`); } - // Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace - if (pm.Raw.metadata?.annotations?.["uds/skip-mutate"] || !(await isIstioInjected(pm))) { + const sc = pm.Raw.spec!.scrapeClass; + if (sc !== undefined && sc !== "istio-certs" && sc !== "exempt") { log.info( - `Mutating scrapeClass to exempt PodMonitor ${pm.Raw.metadata?.name} from default scrapeClass mTLS config`, + `PodMonitor ${pm.Raw.metadata?.name} uses custom scrapeClass (${sc}); skipping mutation.`, ); - pm.Raw.spec.scrapeClass = "exempt"; - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (!pm.Raw.spec.fallbackScrapeProtocol) { - pm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } - return; - } else { - log.info(`Patching pod monitor ${pm.Raw.metadata?.name} for mTLS metrics`); - const endpoints: PodMonitorEndpoint[] = pm.Raw.spec.podMetricsEndpoints || []; - endpoints.forEach(endpoint => { - endpoint.scheme = PodMonitorScheme.HTTPS; - }); - pm.Raw.spec.podMetricsEndpoints = endpoints; - // Support the legacy (Prometheus 2.x fallback) until upstream applications properly handle protocol - if (!pm.Raw.spec.fallbackScrapeProtocol) { - pm.Raw.spec.fallbackScrapeProtocol = FallbackScrapeProtocol.PrometheusText004; - } } - }); -// This assumes istio-injection === strict mTLS due to complexity around mTLS lookup -async function isIstioInjected(monitor: PrometheusServiceMonitor | PrometheusPodMonitor) { - // If monitor allows any namespace assume istio injection - if (monitor.Raw.spec?.namespaceSelector?.any) { - return true; - } - - const namespaces = monitor.Raw.spec?.namespaceSelector?.matchNames || [ - monitor.Raw.metadata?.namespace, - ] || ["default"]; - - for (const ns of namespaces) { - const namespace = await K8s(kind.Namespace).Get(ns); - if (namespace.metadata?.labels && namespace.metadata.labels["istio-injection"] === "enabled") { - return true; + if ( + pm.Raw.metadata?.annotations?.["uds/skip-mutate"] || + !(await isIstioInjected(pm)) || + sc === "exempt" + ) { + log.info(`PodMonitor ${pm.Raw.metadata?.name} meets skip conditions; clearing scrapeClass.`); + delete pm.Raw.spec!.scrapeClass; + return; } - } - return false; -} + log.info( + `Patching PodMonitor ${pm.Raw.metadata?.name}: clearing scrapeClass, setting endpoints to HTTP, and removing TLS config.`, + ); + delete pm.Raw.spec!.scrapeClass; + if (pm.Raw.spec?.podMetricsEndpoints && Array.isArray(pm.Raw.spec.podMetricsEndpoints)) { + pm.Raw.spec.podMetricsEndpoints.forEach((endpoint: PodMonitorEndpoint) => { + endpoint.scheme = PodMonitorScheme.HTTP; + }); + } + }); diff --git a/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml b/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml deleted file mode 100644 index e9ea8bb100..0000000000 --- a/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml +++ /dev/null @@ -1,26 +0,0 @@ -# Copyright 2024 Defense Unicorns -# SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial - -# This pod monitor is used instead of a service monitor to handle mTLS with self-monitoring -apiVersion: monitoring.coreos.com/v1 -kind: PodMonitor -metadata: - name: prometheus-pod-monitor - namespace: monitoring - annotations: - uds/skip-mutate: "true" -spec: - selector: - matchLabels: - app: prometheus - podMetricsEndpoints: - - port: http-web - - port: reloader-web - # Ensure we filter out the init containers - relabelings: - - sourceLabels: [__meta_kubernetes_pod_container_init] - regex: "true" - action: drop - namespaceSelector: - matchNames: - - monitoring diff --git a/src/prometheus-stack/chart/templates/uds-package.yaml b/src/prometheus-stack/chart/templates/uds-package.yaml index 6a3f086b59..e732e97316 100644 --- a/src/prometheus-stack/chart/templates/uds-package.yaml +++ b/src/prometheus-stack/chart/templates/uds-package.yaml @@ -8,6 +8,8 @@ metadata: namespace: {{ .Release.Namespace }} spec: network: + serviceMesh: + mode: ambient allow: # Permit intra-namespace communication - direction: Ingress diff --git a/src/prometheus-stack/common/zarf.yaml b/src/prometheus-stack/common/zarf.yaml index a91a1637d9..06185306a3 100644 --- a/src/prometheus-stack/common/zarf.yaml +++ b/src/prometheus-stack/common/zarf.yaml @@ -24,8 +24,8 @@ components: actions: onDeploy: after: - - description: Annotate all service and pod monitors to ensure they are mutated with the 3.x fallbackScrapeProtocol + - description: Annotate all service and pod monitors to ensure they are mutated for ambient http metrics cmd: | - # This ensures that all monitors go through the latest Pepr mutation code to have fallbackScrapeProtocol added - ./zarf tools kubectl annotate servicemonitors -A --all uds.dev/prometheus-fallback=true - ./zarf tools kubectl annotate podmonitors -A --all uds.dev/prometheus-fallback=true + # This ensures that all monitors go through the latest Pepr mutation code to use the correct http metrics + ./zarf tools kubectl annotate servicemonitors -A --all uds.dev/prometheus-ambient=true + ./zarf tools kubectl annotate podmonitors -A --all uds.dev/prometheus-ambient=true diff --git a/src/prometheus-stack/values/values.yaml b/src/prometheus-stack/values/values.yaml index 216b6c78ce..9e4bfac989 100644 --- a/src/prometheus-stack/values/values.yaml +++ b/src/prometheus-stack/values/values.yaml @@ -23,27 +23,9 @@ nodeExporter: interval: "" prometheus: serviceMonitor: - selfMonitor: false + selfMonitor: true prometheusSpec: - enableFeatures: - - remote-write-receiver - additionalConfig: - scrapeClasses: - - name: istio-certs - default: true - tlsConfig: - caFile: /etc/prom-certs/root-cert.pem - certFile: /etc/prom-certs/cert-chain.pem - keyFile: /etc/prom-certs/key.pem - insecureSkipVerify: true - - name: exempt podMetadata: - annotations: - proxy.istio.io/config: | - proxyMetadata: - OUTPUT_CERTS: /etc/istio-output-certs - sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs", "mountPath": "/etc/istio-output-certs"}]' - traffic.sidecar.istio.io/includeOutboundIPRanges: "" labels: app: prometheus podMonitorSelectorNilUsesHelmValues: false @@ -66,13 +48,6 @@ prometheus: requests: storage: 50Gi storageClassName: null - volumeMounts: - - mountPath: /etc/prom-certs/ - name: istio-certs - volumes: - - emptyDir: - medium: Memory - name: istio-certs prometheus-node-exporter: containerSecurityContext: readOnlyRootFilesystem: true @@ -107,12 +82,6 @@ prometheusOperator: requests: cpu: 100m memory: 512Mi - alertmanager: alertmanagerSpec: - scheme: "https" - tlsConfig: - caFile: /etc/prom-certs/root-cert.pem - certFile: /etc/prom-certs/cert-chain.pem - insecureSkipVerify: true - keyFile: /etc/prom-certs/key.pem + scheme: "http"