You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do you want this feature:
theliv investigator functions are supposed to analyze the alerts deeply and provide actionable insights/next steps to the users. This means investigator functions should analyze kubernetes events in combination with the alert information and provide more information to the user.
Describe the solution you'd like:
Theliv provides an investigation framework on top of prometheus alerts. This means it will analyze alerts from prometheus, dive deeper to provide actionable insights to the user. E.g. when a crashloop backoff alert is triggered, typically a sre or a devops member would dive deeper to figure out the root cause. Many a times, that involves analyzing the kubernetes events.
theliv has an investigator for crash loopbackoff which needs to be enhanced to analyze the kubernetes events and use that information to provide more information to user. E.g. it could provide more information to user based on the exit code etc.
the same goes for other investigators as well.
events are maintained in etcd usually for an hour. So the investigator function will work on a best effort basis i.e. if the user is using theliv to debug within that 1 hour, they will be provided with more information. If they use the app after an hour, the investigator function would not be able to analyze the events and hence would do its best to add more information on top of what is already provided by the alert.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The text was updated successfully, but these errors were encountered:
Why do you want this feature:
theliv investigator functions are supposed to analyze the alerts deeply and provide actionable insights/next steps to the users. This means investigator functions should analyze kubernetes events in combination with the alert information and provide more information to the user.
Describe the solution you'd like:
Theliv provides an investigation framework on top of prometheus alerts. This means it will analyze alerts from prometheus, dive deeper to provide actionable insights to the user. E.g. when a crashloop backoff alert is triggered, typically a sre or a devops member would dive deeper to figure out the root cause. Many a times, that involves analyzing the kubernetes events.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The text was updated successfully, but these errors were encountered: