[APM]: Replace error occurrence watchers with Kibana Alerting#46547
[APM]: Replace error occurrence watchers with Kibana Alerting#46547dgieselaar wants to merge 1 commit intoelastic:masterfrom
Conversation
|
Woops, this was supposed to be a draft PR. I'll update the description in a bit. |
💔 Build Failed |
There was a problem hiding this comment.
I don't think we need this right now. I was figuring out if we could create the email action we need when registering the alert type, but because I never got around to implementing email it's not being used.
x-pack/legacy/plugins/apm/index.ts
Outdated
There was a problem hiding this comment.
I'm not sure if these should be required, or if they are optional. If it's the latter, not sure how we can get access to these plugins on startup.
There was a problem hiding this comment.
Really good question for platform how to handle optional dependencies. I think it has come up before but can't remember the answer.
There was a problem hiding this comment.
New platform allows for optional dependencies, but legacy platform does not. If you are depending on something in a legacy plugin that could be disabled, you need to make sure that you implement isEnabled and check that the dependency is enabled, otherwise Kibana will crash.
There was a problem hiding this comment.
I've simplified this because we can just pass a csv to the action, because it uses nodemailer under the hood. Never tested it though.
There was a problem hiding this comment.
Not sure if we're allowed to do this or if we have to use the API.
There was a problem hiding this comment.
This is ok to do, all the business logic currently lives within the client and you're getting the client from the request (which is good). You will just have to handle validation until we have some within the client (if we decided to do so).
There was a problem hiding this comment.
I've created a route for this, because we have to execute several concurrent and sequential requests, and the server is a more robust environment for that kind of dependencies.
There was a problem hiding this comment.
You can also use services.callCluster(...). It will be in the context of the user who created the alert (security wise). The approach you have is fine as well if you don't want that.
f01d96d to
087d6ec
Compare
💔 Build Failed |
087d6ec to
c6b3361
Compare
💔 Build Failed |
| .then((id: string) => { | ||
| .then(savedObject => { | ||
| this.props.onClose(); | ||
| const id = 'id' in savedObject ? savedObject.id : NOT_AVAILABLE_LABEL; |
There was a problem hiding this comment.
Maybe I'm outmoded, but isn't the in operator generally discouraged since it's at risk for prototype hijacking?
| '<br/>' + | ||
| '{errorLogMessage}<br/>' + | ||
| '{errorCulprit}<br/>' + | ||
| '{docCount} occurrences<br/>', |
There was a problem hiding this comment.
do want to preserve new lines \n characters in email template like in the slack template?
|
Closing in favor of #59566. |
This is a proof of concept that replaces the APM error occurrence Watcher (an Elasticsearch feature) with the new Kibana alerting and actions plugin. The Slack action succesfully fires, but I haven't bothered with the Email action because the approach is pretty similar, and we decided to timebox this to two days.
Here's the gist:
Alert Typefor error occurrences. AnAlert Typeis essentially a function (called an executor) that, given a set of parameters, decides whether (and which) actions need to be triggered. Actions are triggered via anAlert Instance, which captures state of previous executions (for the purpose of this POC, I don't think we need that state).Alertobject on the server. AnAlertis a configuration object for anAlert Type, that tells it at what interval it and with what parameters it should execute.Alertalso configures the action groups that can be triggered from theAlert Typeexecutor. For this POC, we create Slack and Email actions based on the user's input when configuring an alert, and add them to the default group.Alert Type, that now runs at the configured interval for theAlertthat was created, we run a query for the number of error occurrences and determine whether the threshold was exceeded. If so, we fire the default actions for theAlert, which can be either the Slack or Email action, or both.To test this, make sure to explicitly enable both required plugins in your kibana config file:
Notes:
Alertonly supportsintervalas a scheduling option. That means that we cannot run the executor at a given time each day, which is something that our current implementation does support. I've been told expanding the scheduling options is on the roadmap.intervalparameter. However, it doesn't seem like that's automatically available in the executor, so we pass it as a parameter instead. Maybe there's a nicer way to solve this.Some questions/suggestions for the Alerting/Actions team:
secretsparam, but this is not reflected in the documentation (https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/README.md)intervalavailable in the executor as well (if it's not already there and I missed it). Seems like it's a common use case.actionGroupsis for in theAlert Type. It's seemingly not documented, but it is required.registerTypecould be improved by having typedparams(instead ofRecord<string, any>). Similar to what we did for APM in [APM] migrate to io-ts #42961 (happy to open a PR if welcome).kbn-actionCLI tool has been super useful, thanks for taking the time to build that.