Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have metrics dedicated to context cancelled/deadline exceeded #6351

Closed
rudrakhp opened this issue Oct 27, 2023 · 3 comments · Fixed by open-policy-agent/opa-envoy-plugin#477
Labels
feature-request help wanted int-envoy Issues related to the opa-envoy-plugin

Comments

@rudrakhp
Copy link
Contributor

What is the underlying problem you're trying to solve?

We have logs that can help identify context deadline exceeded and context cancel events. But from monitoring and alerting perspective there is no metric today (referring to this list).

Describe the ideal solution

I think it would be good to have metrics dedicated to context cancels, along with a reason tag maybe (check request timed out, http send timed out, context cancelled during X eval, etc)

Describe a "Good Enough" solution

We could skip having a reason tag in the short term, but a basic metric would definitely be helpful

Additional Context

N/A

@ashutosh-narkar ashutosh-narkar added int-envoy Issues related to the opa-envoy-plugin help wanted labels Oct 27, 2023
@ashutosh-narkar
Copy link
Member

The opa-envoy plugin has the option to include performance metrics via prometheus. We could add a counter in there for this. These metrics are then surfaced via the Status API.

@rudrakhp
Copy link
Contributor Author

rudrakhp commented Oct 28, 2023

@ashutosh-narkar Thanks for the quick response!

I have been trying to capture and classify various errors we are getting in our logs. Here is a log I had a question about:

{
  "level": "error",
  "msg": "Log event masking failed: eval_cancel_error: caller cancelled query execution.",
  "plugin": "decision_logs",
  "time": "2023-10-28T13:51:59Z"
}

I see a TODO here, is this why the error is not propagated to envoy plugin where ideally the complete decision log (including input) should be logged? Any pointers so I can understand this issue better would be helpful. Thanks!

@ashutosh-narkar
Copy link
Member

The code you're referred to is old. OPA uses the main branch not master. That code has been removed and we maintain metrics for errors in the decision log plugin. For the specific error in your log, there is currently no counter to track it. So we can add one or like I mentioned previously you can add a counter in the plugin itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request help wanted int-envoy Issues related to the opa-envoy-plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants