-
Notifications
You must be signed in to change notification settings - Fork 729
gep-4768: add TelemetryPolicy API proposal #4872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -61,9 +61,392 @@ To mitigate the challenge of complex merging semantics, this GEP restricts confi | |||||||||
| - **Access Logs**: Filtering for smart logging (e.g., only log 5xx errors or high latency), multi-protocol support, and log format customization (including field selection). | ||||||||||
| - **Export Configuration**: Supporting TLS connections to telemetry collectors and the ability to inject custom headers (e.g., `Authorization`) into telemetry requests. | ||||||||||
|
|
||||||||||
| ## Request Flow | ||||||||||
| ### Request Flow | ||||||||||
|
|
||||||||||
| * A platform operator creates a `TelemetryPolicy` resource targeting a `Gateway`. | ||||||||||
| * The Gateway API implementation reconciles this resource and configures the underlying data plane. | ||||||||||
| * The data plane extracts the specified signals and exports them to the telemetry infrastructure. | ||||||||||
|
|
||||||||||
| ### The `TelemetryPolicy` Specification | ||||||||||
|
|
||||||||||
| We propose the `TelemetryPolicy` as a direct policy attachment in the `gateway.networking.k8s.io` API group. See [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/#classes-of-policies) for more information on direct attachment. | ||||||||||
|
|
||||||||||
| The following is an example that demonstrates the structure of the `TelemetryPolicy`. | ||||||||||
|
|
||||||||||
| ```yaml | ||||||||||
| apiVersion: agentic.networking.x-k8s.io/v1alpha1 | ||||||||||
| kind: TelemetryPolicy | ||||||||||
| metadata: | ||||||||||
| name: standard-telemetry | ||||||||||
| namespace: prod-ns | ||||||||||
| spec: | ||||||||||
| # GEP-713 Attachment | ||||||||||
| targetRefs: | ||||||||||
| - group: gateway.networking.k8s.io | ||||||||||
| kind: Gateway | ||||||||||
| name: my-gateway | ||||||||||
|
|
||||||||||
| # 1. Tracing Configuration | ||||||||||
| tracing: | ||||||||||
| mode: "On" | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the mode here is kind of a weird value. Usually we avoid to have bools or "wana be bools" (On/Off, True/False) and instead make them more explicit on Behavior (Disabled, PartiallyEnabled, FullyEnabled) If we can only have On/Off, wouldn't it make more sense to say "if tracing is null, tracing should be disabled" instead?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (btw this applies to every "Mode" defined here) |
||||||||||
| provider: | ||||||||||
| endpoint: "otel-collector.monitoring.svc:4317" | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should probably use backendRef style API to align with other APIs
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 otherwise we have to discuss again how to specify TLS, ... (as also mentioned in https://github.com/kubernetes-sigs/gateway-api/pull/4872/changes/BASE..084523a5d6aa69293ca1b9577fabdab8487ba0da#r3260666205) |
||||||||||
| samplingRate: | ||||||||||
| numerator: 5 # Represents 5/100 (5%) because denominator defaults to 100 | ||||||||||
| parentBasedSampling: | ||||||||||
| mode: "On" | ||||||||||
| samplingRate: | ||||||||||
| numerator: 50 # Represents 50/100 (50%) | ||||||||||
| customAttributes: | ||||||||||
| - name: "env" | ||||||||||
| type: Literal | ||||||||||
| literalValue: "production" | ||||||||||
| - name: "mcp_task_name" | ||||||||||
| type: Metadata | ||||||||||
| metadataKey: "my.custom.filter.mcp_task_name" | ||||||||||
|
|
||||||||||
| # 2. Metrics Configuration | ||||||||||
| metrics: | ||||||||||
| mode: "On" | ||||||||||
| overrides: | ||||||||||
| - name: "example.com/http/request_count" | ||||||||||
| type: Counter | ||||||||||
| attributes: # Inject custom attributes/labels | ||||||||||
| - name: "x-model-id" | ||||||||||
| type: Header | ||||||||||
| headerName: "X-Model-Id" | ||||||||||
| - name: "mcp_task_name" | ||||||||||
| type: Metadata | ||||||||||
| metadataKey: "my.custom.filter.mcp_task_name" | ||||||||||
| - name: "environment" | ||||||||||
| type: Literal | ||||||||||
| literalValue: "production" | ||||||||||
|
|
||||||||||
| # 3. Access Logs Configuration | ||||||||||
| accessLogs: | ||||||||||
| mode: "Off" # Explicitly disabled while keeping the configuration intact | ||||||||||
| matches: "response.code >= 500" # Conditional logging, CEL filtering for errors | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. honest question here: is CEL matching for logs something usually supported by OTEL exporters, or is this something specific from Istio that we are carrying? |
||||||||||
| fields: # Configure specific fields to include, indicating their source | ||||||||||
| - name: "start_time" | ||||||||||
| type: Standard | ||||||||||
| standardValue: "RequestStartTime" | ||||||||||
| - name: "response_code" | ||||||||||
| type: Standard | ||||||||||
| standardValue: "ResponseCode" | ||||||||||
| - name: "x-token-usage" | ||||||||||
| type: Header | ||||||||||
| headerName: "X-Token-Usage" | ||||||||||
| - name: "mcp_task_name" | ||||||||||
| type: Metadata | ||||||||||
| metadataKey: "my.custom.filter.mcp_task_name" | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| #### Detailed Resource Description | ||||||||||
|
|
||||||||||
| The following are the Go structs modeling the proposed specification. | ||||||||||
|
|
||||||||||
| ```Go | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
the case here makes difference for markdown rendering :) |
||||||||||
| // TelemetryPolicy defines a direct policy attachment to configure | ||||||||||
| // observability signals for Gateways. | ||||||||||
| type TelemetryPolicy struct { | ||||||||||
| metav1.TypeMeta `json:",inline"` | ||||||||||
| metav1.ObjectMeta `json:"metadata,omitempty"` | ||||||||||
|
|
||||||||||
| Spec TelemetryPolicySpec `json:"spec"` | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to avoid previous mistakes, can we make it explicit this is required?
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: you mean "the desired state of TelemetryPolicy" probably |
||||||||||
|
|
||||||||||
| // status defines the observed state of TelemetryPolicy. | ||||||||||
| // +optional | ||||||||||
| Status TelemetryPolicyStatus `json:"status,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| type TelemetryPolicySpec struct { | ||||||||||
| // Identifies the target gateways to which this policy attaches (GEP-713). | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we must be specific here that at least one target is required |
||||||||||
| TargetRefs []NamespacedPolicyTargetReference `json:"targetRefs"` | ||||||||||
|
|
||||||||||
| // Configuration for distributed tracing options. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment must be more complete. Think on users using this API. We must write here what is tracing, what Gateway users will get from it. This applies for every comment on every struct, I would recommend taking a look into https://gateway-api.sigs.k8s.io/reference/api-spec/main/spec/#backendtlspolicy Each API field must contain:
Also, given this is a field documentation for implementations, you must consider:
Think on https://gateway-api.sigs.k8s.io/guides/api-design/. The documentation must be separated between user facing and implementation facing (when writing implementation/developers facing, use the tag Also, when defining fields you must define what kind of support you expect for it. Eg.:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 defining support levels at this stage is very important. Also worth keeping in mind, everything in core and extended should be covered by conformance tests, otherwise it risks becoming implementation-specific anyway. |
||||||||||
| Tracing *TracingConfig `json:"tracing,omitempty"` | ||||||||||
|
|
||||||||||
| // Configuration for metric generation and exports. | ||||||||||
| Metrics *MetricsConfig `json:"metrics,omitempty"` | ||||||||||
|
|
||||||||||
| // Configuration for access log generation. | ||||||||||
| AccessLogs *AccessLogsConfig `json:"accessLogs,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // TelemetryMode defines the enablement state of a telemetry signal. | ||||||||||
| type TelemetryMode string | ||||||||||
|
|
||||||||||
| const ( | ||||||||||
| // TelemetryModeOn explicitly enables the telemetry signal. | ||||||||||
| TelemetryModeOn TelemetryMode = "On" | ||||||||||
| // TelemetryModeOff explicitly disables the telemetry signal. | ||||||||||
| TelemetryModeOff TelemetryMode = "Off" | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| // --- Tracing Types --- | ||||||||||
|
|
||||||||||
| type TracingConfig struct { | ||||||||||
| // Mode explicitly controls if tracing is enabled. Valid values are "On" or "Off". | ||||||||||
| // +kubebuilder:validation:Enum=On;Off | ||||||||||
| // +kubebuilder:default=On | ||||||||||
| Mode TelemetryMode `json:"mode,omitempty"` | ||||||||||
|
|
||||||||||
| // Specifies the tracing backend. Includes type (e.g., "OTLP") and endpoint. | ||||||||||
| Provider *TracingProvider `json:"provider,omitempty"` | ||||||||||
|
|
||||||||||
| // The base sampling probability for traces. | ||||||||||
| SamplingRate *Fraction `json:"samplingRate,omitempty"` | ||||||||||
|
|
||||||||||
| // Configures whether to respect the sampling decision of the parent span. | ||||||||||
| ParentBasedSampling *ParentBasedSampling `json:"parentBasedSampling,omitempty"` | ||||||||||
|
|
||||||||||
| // Allows appending custom tags/attributes to spans. | ||||||||||
| CustomAttributes []CustomAttribute `json:"customAttributes,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| type TracingProvider struct { | ||||||||||
| Endpoint string `json:"endpoint,omitempty"` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about TLS options and custom headers, per #4775 (comment) |
||||||||||
| } | ||||||||||
|
|
||||||||||
| type Fraction struct { | ||||||||||
| Numerator int32 `json:"numerator"` | ||||||||||
|
|
||||||||||
| // +kubebuilder:default=100 | ||||||||||
| // +kubebuilder:validation:Minimum=1 | ||||||||||
| Denominator int32 `json:"denominator,omitempty"` // Allows e.g., 1 / 10000 for 0.01% | ||||||||||
| } | ||||||||||
|
|
||||||||||
| type ParentBasedSampling struct { | ||||||||||
| // Mode explicitly controls if parent-based sampling is enabled. Valid values are "On" or "Off". | ||||||||||
| // +kubebuilder:validation:Enum=On;Off | ||||||||||
| // +kubebuilder:default=On | ||||||||||
| Mode TelemetryMode `json:"mode,omitempty"` | ||||||||||
|
|
||||||||||
| // The sampling rate to apply when the parent span decision is used. | ||||||||||
| SamplingRate *Fraction `json:"samplingRate,omitempty"` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this sampling rate needed? If we're already inheriting from the parent, isn't that making the decision on whether or not to sample?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the |
||||||||||
| } | ||||||||||
|
|
||||||||||
| // CustomAttributeType defines the source of a trace attribute's value. | ||||||||||
| type CustomAttributeType string | ||||||||||
|
|
||||||||||
| const ( | ||||||||||
| // CustomAttributeTypeHeader extracts the value from an HTTP header. | ||||||||||
| CustomAttributeTypeHeader CustomAttributeType = "Header" | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we be explicit of what can be added or not? I would expect that we are careful with things like "Authorization" headers and other sensitive headers. |
||||||||||
| // CustomAttributeTypeMetadata extracts the value from proxy metadata or context. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, Metadata is a bit of a confusing concept in this context. I am worried we are making it very "Envoy" centric (unless @sjberman tells me he gets the concept of "metadata" here out of Nginx) As we are leaning towards OTEL, I think it would be good to be more explicit on what each attribute means (eg.: https://opentelemetry.io/docs/specs/semconv/registry/attributes/http/ ) and use some OTEL references here
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No concept of metadata in nginx. |
||||||||||
| CustomAttributeTypeMetadata CustomAttributeType = "Metadata" | ||||||||||
| // CustomAttributeTypeLiteral provides a static, user-defined string value. | ||||||||||
| CustomAttributeTypeLiteral CustomAttributeType = "Literal" | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| type CustomAttribute struct { | ||||||||||
| // Name is the key of the attribute as it will appear in the trace span. | ||||||||||
| Name string `json:"name"` | ||||||||||
|
|
||||||||||
| // Type specifies where the attribute value comes from. | ||||||||||
| // Valid values are "Header", "Metadata", or "Literal". | ||||||||||
| // +kubebuilder:validation:Enum=Header;Metadata;Literal | ||||||||||
| Type CustomAttributeType `json:"type"` | ||||||||||
|
|
||||||||||
| // HeaderName specifies the HTTP header to extract the value from. | ||||||||||
| // This is required if Type is "Header". | ||||||||||
| HeaderName *string `json:"headerName,omitempty"` | ||||||||||
|
|
||||||||||
| // MetadataKey specifies the proxy/context metadata key to extract the value from. | ||||||||||
| // This is required if Type is "Metadata". | ||||||||||
| MetadataKey *string `json:"metadataKey,omitempty"` | ||||||||||
|
|
||||||||||
| // LiteralValue specifies a static string value to attach. | ||||||||||
| // This is required if Type is "Literal". | ||||||||||
| LiteralValue *string `json:"literalValue,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // --- Metrics Types --- | ||||||||||
|
|
||||||||||
| type MetricsConfig struct { | ||||||||||
| // Mode explicitly controls if metric generation is enabled. Valid values are "On" or "Off". | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean to turn off metrics generation? It does not seem reasonable to entirely disable all metrics for an application |
||||||||||
| // +kubebuilder:validation:Enum=On;Off | ||||||||||
| // +kubebuilder:default=On | ||||||||||
| Mode TelemetryMode `json:"mode,omitempty"` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For all the modes, what happens if I do not set the field? is it explicitly OFF or can it be "default for the proxy"? What if I do not attach a TelemetryPolicy at all? |
||||||||||
|
|
||||||||||
| // List of configurations to customize specific metric families. | ||||||||||
| Overrides []MetricOverride `json:"overrides,omitempty"` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this actually do? Is it customizing an existing metric? adding a new one? |
||||||||||
| } | ||||||||||
|
|
||||||||||
| type MetricOverride struct { | ||||||||||
| // The metric name to override (e.g., "http_requests_total" or "gateway.networking.k8s.io/http/request_count"). | ||||||||||
| Name string `json:"name"` | ||||||||||
|
|
||||||||||
| // Type of the metric (e.g., "Counter", "Histogram"). | ||||||||||
| Type string `json:"type,omitempty"` | ||||||||||
|
|
||||||||||
| // Defines custom attributes to attach to the metric. | ||||||||||
| // These are appended to the standard labels emitted by the proxy. | ||||||||||
| Attributes []MetricAttribute `json:"attributes,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // MetricAttributeType defines the source of a metric attribute's value. | ||||||||||
| type MetricAttributeType string | ||||||||||
|
|
||||||||||
| const ( | ||||||||||
| // MetricAttributeTypeHeader extracts the value from an HTTP header. | ||||||||||
| MetricAttributeTypeHeader MetricAttributeType = "Header" | ||||||||||
| // MetricAttributeTypeMetadata extracts the value from proxy metadata or context. | ||||||||||
| MetricAttributeTypeMetadata MetricAttributeType = "Metadata" | ||||||||||
| // MetricAttributeTypeLiteral provides a static, user-defined string value. | ||||||||||
| MetricAttributeTypeLiteral MetricAttributeType = "Literal" | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| type MetricAttribute struct { | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we can add fields only right, not remove them? |
||||||||||
| // Name is the key of the attribute as it will appear in the metric. | ||||||||||
| Name string `json:"name"` | ||||||||||
|
|
||||||||||
| // Type specifies where the attribute value comes from. | ||||||||||
| // Valid values are "Header", "Metadata", or "Literal". | ||||||||||
| // +kubebuilder:validation:Enum=Header;Metadata;Literal | ||||||||||
| Type MetricAttributeType `json:"type"` | ||||||||||
|
|
||||||||||
| // HeaderName specifies the HTTP header to extract the value from. | ||||||||||
| // This is required if Type is "Header". | ||||||||||
| HeaderName *string `json:"headerName,omitempty"` | ||||||||||
|
|
||||||||||
| // MetadataKey specifies the proxy/context metadata key to extract the value from. | ||||||||||
| // This is required if Type is "Metadata". | ||||||||||
| MetadataKey *string `json:"metadataKey,omitempty"` | ||||||||||
|
|
||||||||||
| // LiteralValue specifies a static string value to attach. | ||||||||||
| // This is required if Type is "Literal". | ||||||||||
| LiteralValue *string `json:"literalValue,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // --- Access Logs Types --- | ||||||||||
|
|
||||||||||
| type AccessLogsConfig struct { | ||||||||||
| // Mode explicitly controls if access logging is enabled. Valid values are "On" or "Off". | ||||||||||
| // +kubebuilder:validation:Enum=On;Off | ||||||||||
| // +kubebuilder:default=On | ||||||||||
| Mode TelemetryMode `json:"mode,omitempty"` | ||||||||||
|
|
||||||||||
| // CEL expression for advanced filtering (e.g., matching response codes, headers). | ||||||||||
| Matches string `json:"matches,omitempty"` | ||||||||||
|
|
||||||||||
| // A list of specific fields or headers to include in the logs. | ||||||||||
| Fields []string `json:"fields,omitempty"` | ||||||||||
|
|
||||||||||
| // A list of specific fields to include in the logs, specifying their source. | ||||||||||
| Fields []LogField `json:"fields,omitempty"` | ||||||||||
|
Comment on lines
+332
to
+336
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dupe field? |
||||||||||
| } | ||||||||||
|
|
||||||||||
| // LogFieldType defines the source of a log field's value. | ||||||||||
| type LogFieldType string | ||||||||||
|
|
||||||||||
| const ( | ||||||||||
| // LogFieldTypeHeader extracts the value from an HTTP header. | ||||||||||
| LogFieldTypeHeader LogFieldType = "Header" | ||||||||||
| // LogFieldTypeMetadata extracts the value from proxy metadata or context. | ||||||||||
| LogFieldTypeMetadata LogFieldType = "Metadata" | ||||||||||
| // LogFieldTypeLiteral provides a static, user-defined string value. | ||||||||||
| LogFieldTypeLiteral LogFieldType = "Literal" | ||||||||||
| // LogFieldTypeStandard extracts a standard proxy log value (e.g., duration, start time). | ||||||||||
| LogFieldTypeStandard LogFieldType = "Standard" | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| type LogField struct { | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear to me what the fields LogField in the log represents, is it a JSON field for specifying additional key-value pairs? How do you plan to support customization of nested JSON structures? These are commonly used in access logs, e.g., in the Airlock Microgateway we use the Elastic Common Schema (ECS). Since access logs tend to be highly implementation-specific, IMO a format string or an extension point is probably the best approach here. |
||||||||||
| // Name is the key/name of the field as it will appear in the access log output. | ||||||||||
| Name string `json:"name"` | ||||||||||
|
|
||||||||||
| // Type specifies where the field value comes from. | ||||||||||
| // Valid values are "Header", "Metadata", "Literal", or "Standard". | ||||||||||
| // +kubebuilder:validation:Enum=Header;Metadata;Literal;Standard | ||||||||||
| Type LogFieldType `json:"type"` | ||||||||||
|
|
||||||||||
| // HeaderName specifies the HTTP header to extract the value from. | ||||||||||
| // This is required if Type is "Header". | ||||||||||
| HeaderName *string `json:"headerName,omitempty"` | ||||||||||
|
|
||||||||||
| // MetadataKey specifies the proxy/context metadata key to extract the value from. | ||||||||||
| // This is required if Type is "Metadata". | ||||||||||
| MetadataKey *string `json:"metadataKey,omitempty"` | ||||||||||
|
|
||||||||||
| // LiteralValue specifies a static string value to attach. | ||||||||||
| // This is required if Type is "Literal". | ||||||||||
| LiteralValue *string `json:"literalValue,omitempty"` | ||||||||||
|
|
||||||||||
| // StandardValue specifies a standard log property (e.g., "RequestStartTime", "Duration"). | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these defined? or implementation specific? |
||||||||||
| // This is required if Type is "Standard". | ||||||||||
| StandardValue *string `json:"standardValue,omitempty"` | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // --- Policy Status --- | ||||||||||
|
|
||||||||||
| // TelemetryPolicyStatus defines the observed state of TelemetryPolicy. | ||||||||||
| type TelemetryPolicyStatus struct { | ||||||||||
| // For Policy Status API conventions, see: | ||||||||||
| // https://gateway-api.sigs.k8s.io/geps/gep-713/#the-status-stanza-of-policy-objects | ||||||||||
| // | ||||||||||
| // Ancestors is a list of ancestor resources (usually Backend) that are | ||||||||||
| // associated with the policy, and the status of the policy with respect to | ||||||||||
| // each ancestor. When this policy attaches to a parent, the controller that | ||||||||||
| // manages the parent and the ancestors MUST add an entry to this list when | ||||||||||
| // the controller first sees the policy and SHOULD update the entry as | ||||||||||
| // appropriate when the relevant ancestor is modified. | ||||||||||
| // | ||||||||||
| // Note that choosing the relevant ancestor is left to the Policy designers; | ||||||||||
| // an important part of Policy design is designing the right object level at | ||||||||||
| // which to namespace this status. | ||||||||||
| // | ||||||||||
| // Note also that implementations MUST ONLY populate ancestor status for | ||||||||||
| // the Ancestor resources they are responsible for. Implementations MUST | ||||||||||
| // use the ControllerName field to uniquely identify the entries in this list | ||||||||||
| // that they are responsible for. | ||||||||||
| // | ||||||||||
| // Note that to achieve this, the list of PolicyAncestorStatus structs | ||||||||||
| // MUST be treated as a map with a composite key, made up of the AncestorRef | ||||||||||
| // and ControllerName fields combined. | ||||||||||
| // | ||||||||||
| // A maximum of 16 ancestors will be represented in this list. An empty list | ||||||||||
| // means the Policy is not relevant for any ancestors. | ||||||||||
| // | ||||||||||
| // If this slice is full, implementations MUST NOT add further entries. | ||||||||||
| // Instead they MUST consider the policy unimplementable and signal that | ||||||||||
| // on any related resources such as the ancestor that would be referenced | ||||||||||
| // here. | ||||||||||
| // | ||||||||||
| // +required | ||||||||||
| // +listType=atomic | ||||||||||
| // +kubebuilder:validation:MaxItems=16 | ||||||||||
| Ancestors []PolicyAncestorStatus `json:"ancestors"` | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Comparison with Prior Art | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This API seems to basically just be directly Istio's API. It doesn't seem like we really explored prior art beyond that even though there was substantial feedback on the original agentic-networking exploring alternative approaches. Can we get some of that context here? Even if its 'ideas considered but rejected' its still useful
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also include the Airlock Microgateway Telemetry CR and the NGINX Tracing API as prior art? |
||||||||||
|
|
||||||||||
| ### Istio | ||||||||||
|
|
||||||||||
| [Istio](https://istio.io/)'s `Telemetry` API is the most direct prior art that inspired this proposal. It allows configuring observability at the mesh, namespace, and workload level. | ||||||||||
|
|
||||||||||
| * **Metrics**: Istio allows users to enable/disable specific metrics, add custom dimensions, and configure providers. | ||||||||||
| * **Logs**: Istio supports access logging configurations with CEL-like expressions for advanced filtering. | ||||||||||
| * **Traces**: Istio supports probabilistic sampling, context propagation, and custom span tags. | ||||||||||
| * **Customization**: For advanced telemetry use-cases not natively covered by the `Telemetry` API, Istio users can fall back to using `EnvoyFilter` resources. While highly flexible, `EnvoyFilter` requires deep knowledge of Envoy's internal xDS API. This is tightly coupled to the data plane implementation and can be brittle across version upgrades. | ||||||||||
| * **Comparison**: The proposed `TelemetryPolicy` adapts Istio's powerful intent-based capabilities to the standardized Gateway API attachment model. | ||||||||||
|
|
||||||||||
| ### Envoy Gateway | ||||||||||
|
|
||||||||||
| [Envoy Gateway](https://gateway.envoyproxy.io/) configures observability through two distinct custom resources: `EnvoyGateway` for the control plane and `EnvoyProxy` for the underlying data plane proxies. | ||||||||||
|
|
||||||||||
| * **Metrics**: Envoy Gateway allows configuring Prometheus and OpenTelemetry sinks for both the control plane (using `EnvoyGateway` CRD) and the data plane proxies (using the `EnvoyProxy` CRD). | ||||||||||
| * **Logs**: Proxy access logs are configured via the `EnvoyProxy` resource. It supports exporting to file, OTLP, or gRPC Access Log Service (ALS) sinks. It uses CEL expressions for smart filtering (e.g., matching specific headers), and allows applying log configurations at the Route or Listener level. | ||||||||||
| * **Tracing**: Tracing is configured in the `EnvoyProxy` resource. It allows configuring sampling and supports appending custom tags derived from literals, environment variables, or request headers. | ||||||||||
| * **Customization**: For advanced telemetry use-cases not covered natively, users can fall back to the `EnvoyPatchPolicy` API to mutate the underlying xDS configuration using JSON Patch semantics. This is similar to Istio's `EnvoyFilter`. | ||||||||||
| * **Comparison**: While Envoy Gateway provides a robust, native telemetry configuration, it is tightly coupled to infrastructure-oriented CRDs. The proposed `TelemetryPolicy` allows users to configure telemetry behaviors using a portable `targetRef` model, without binding their observability intent to an Envoy-specific schema. | ||||||||||
|
|
||||||||||
| ### Kuadrant | ||||||||||
|
|
||||||||||
| [Kuadrant](https://kuadrant.io/) provides observability for API management features like rate limiting and authentication. It is configured through a mix of its own custom resources and the underlying gateway's APIs. | ||||||||||
|
|
||||||||||
| * **Metrics**: Kuadrant enables metrics via the `Kuadrant` CR. It also introduces its own `TelemetryPolicy` API (extensions.kuadrant.io/v1alpha1) to add custom dimensions to metrics. | ||||||||||
| * **Logs**: For proxy access logging, Kuadrant relies on the underlying gateway provider (e.g., Istio's Telemetry API). However, it configures request correlation across its own components (Authorino, Limitador, and Wasm-shim) by specifying HTTP header identifiers in the `Kuadrant` CR. | ||||||||||
| * **Tracing**: Tracing is configured centrally via the `Kuadrant` CR. It exports OpenTelemetry spans for both the control plane and data plane components. It supports global trace filtering levels to control the verbosity of exported spans. | ||||||||||
| * **Customization**: To make low-level, custom modifications to the data plane configuration that are not supported by Kuadrant's native APIs, users can bypass Kuadrant and directly use the underlying gateway's mechanisms. | ||||||||||
| * **Comparison**: While Kuadrant provides powerful, identity-aware telemetry (like token tracking per user), its configuration is fragmented across the `Kuadrant` CR, components specific CRDs, its custom extension `TelemetryPolicy`, and the underlying gateway's native APIs. The proposed `TelemetryPolicy` aims to unify these intent-based capabilities into a single, provider-agnostic resource. | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should enabling of tracing imply that 'traceparent' header propagation also happens?