-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto enable network for Prometheus metrics scraping based on configuration #570
Open
zohar7ch
wants to merge
7
commits into
main
Choose a base branch
from
zohar7ch/support-auto-enable-metrics-scraping-servers
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Auto enable network for Prometheus metrics scraping based on configuration #570
zohar7ch
wants to merge
7
commits into
main
from
zohar7ch/support-auto-enable-metrics-scraping-servers
+1,150
−4
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2954a06
to
5d9243a
Compare
omris94
reviewed
Mar 6, 2025
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/service_reconciler.go
Outdated
Show resolved
Hide resolved
src/operator/controllers/metrics_collectors/network_policy_handler.go
Outdated
Show resolved
Hide resolved
Add tests please 🙂 |
c04d394
to
4f59594
Compare
This is the first layer in managing Prometheus scrape annotations. When a pod changes, we aim to reduce the entire state of the pod’s namespace and ensure that the current state of the cluster aligns with the expected configuration. We reduce the state of the entire namespace, rather than just the individual pod, because sometimes we cannot determine everything needed by looking at just the pod. For example, in the case of a deployment with a single pod (which has pod annotations for scraping metrics), when we terminate this pod, there is a race condition between creating (or keeping) or deleting the pod's metrics-collection network policy. This depends on when the original pod terminates. Instead of attempting to handle many edge cases, we opted for a stateless approach similar to the one used in service-effective-policy. This approach calculates the state and only updates what is necessary.
change When the configuration is set to 'If blocked by Otterize,' we create a network policy to enable metrics collection only if another network policy, created by Otterize, blocks communication to the pod. Otterize can block network traffic either based on the pod itself or its corresponding service. We can detect this service only after an endpoint is established between the service and the pod. Therefore, we need to check the status of the pods after the endpoint is up and running.
…hange We want to reconcile after a network policy change to handle multiple scenarios: 1. When a new network policy is created or deleted by Otterize: If the configuration is set to 'if blocked by Otterize,' this means we may need to create or delete a metric collection network policy. 2. To address a race condition: For instance, if there is an operator update and a new instance is created, we don’t want the old operator to determine the cluster's state. If the last instance to run is the one shutting down and it decides to modify a network policy, the active instance will receive an update, which could change the state as needed.
…icies In Prometheus, we can choose which pod to scrape using scrape annotations. These annotations can be applied to the pod, a service, an ingress, and so on. When creating a network policy for scraping metrics, we aim to cover all possible levels and differentiate between them. While we could create a single network policy that handles all annotations, managing each one separately makes the code more readable and reduces the number of edge cases. This refactor enables us to specify and target the annotation level for which we created the network policy.
We want to add events when creating \ updating \ deleting network policy. The event we will add on the resource that was responsible for the network policy (meaning, the one that has the scraping annotation).
40f4f11
to
1e763e5
Compare
omris94
approved these changes
Mar 11, 2025
1e763e5
to
c75ecee
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Before this change, if you had Prometheus server in your cluster that scrape metrics from multiple workloads - you had to configure client intents for all your workload in order to enable Prometheus to scrape them.
Now, you can just set the configuration to allow Otterize to detect which workloads need to be scraped (based on Prometheus's scrape annotations) - and Otterize will enable the communication to the scrape metric port on its own.
Testing
Tested locally with service-client example, and Prometheus community edition (prometheus-community/prometheus)
Also include details of the environment this PR was developed in (language/platform/browser version).
Checklist