Added Enhancements to Log Ai Insight#247291
Conversation
|
Pinging @elastic/obs-ai-team (Team:obs-ai) |
| You are assisting an SRE who is viewing a log entry in the Kibana Logs UI. | ||
| Using the provided data produce a concise, action-oriented response.`); | ||
| Using the provided data produce a concise, action-oriented response. | ||
| If it's an issue, provide remediation steps suitable for an on-call SRE.`); |
There was a problem hiding this comment.
The wording here is a bit ambiguous I feel.
If it's an issue doesn't clearly say what an issue is?
Is it referring to an error log or a warning log? What can be considered as an issue in this context?
And do we need to say suitable for an on-call SRE?
There was a problem hiding this comment.
If it's an issue - I added it because, in the message, we ask to explain the log and whether "it is an issue". We didn’t specify what qualifies as an issue there. I’m happy to clarify it if you think it’s needed, but the LLM was able to define it nicely on its own.
Regarding suitable for an on-call SRE - I added this because we already mention in the system promt that the audience is an SRE, and I assume that if there is an issue, we want to provide some next steps to investigate. I don’t have a strong opinion on keeping this part or not and am happy to change it.
There was a problem hiding this comment.
I added it because, in the message, we ask to explain the log and whether "it is an issue"
I think that should change as well. We have to avoid ambiguity in system prompt and instructions as much as possible, as different LLMs may interpret it in different ways if it's ambiguous.
…lder/server/routes/ai_insights/get_log_ai_insights.ts Co-authored-by: Viduni Wickramarachchi <viduni.ushanka@gmail.com>
…lder/server/routes/ai_insights/get_log_ai_insights.ts Co-authored-by: Viduni Wickramarachchi <viduni.ushanka@gmail.com>
|
Pinging @elastic/obs-presentation-team (Team:obs-presentation) |
|
We are adding more information ( |
| ${entityLinkingInstructions} | ||
| `) | ||
| : dedent(` | ||
| You are an expert SRE assistant analyzing an info, debug, or trace log entry. Keep it concise: |
| <LogContext> | ||
| ${context} | ||
| </LogContext> |
There was a problem hiding this comment.
Does this cause multiple levels of nesting?
<LogContext>
<CorrelatedLogSequence>
...
</CorrelatedLogSequence>
</LogContext>
If so, remove the LogContext tag
| errorLogsOnly, | ||
| errorLogsOnly = DEFAULT_ERROR_LOGS_ONLY, | ||
| index, | ||
| correlationFields, | ||
| correlationFields = DEFAULT_CORRELATION_IDENTIFIER_FIELDS, | ||
| logId, | ||
| logSourceFields, | ||
| maxSequences, | ||
| maxLogsPerSequence, | ||
| logSourceFields = DEFAULT_LOG_SOURCE_FIELDS, | ||
| maxSequences = DEFAULT_MAX_SEQUENCES, | ||
| maxLogsPerSequence = DEFAULT_MAX_LOGS_PER_SEQUENCE, | ||
| }: { | ||
| core: ObservabilityAgentBuilderCoreSetup; | ||
| logger: Logger; | ||
| esClient: IScopedClusterClient; | ||
| start: string; | ||
| end: string; | ||
| kqlFilter?: string; | ||
| errorLogsOnly: boolean; | ||
| errorLogsOnly?: boolean; | ||
| index?: string; | ||
| correlationFields: string[]; | ||
| correlationFields?: string[]; | ||
| logId?: string; | ||
| logSourceFields: string[]; | ||
| maxSequences: number; | ||
| maxLogsPerSequence: number; | ||
| logSourceFields?: string[]; | ||
| maxSequences?: number; | ||
| maxLogsPerSequence?: number; |
There was a problem hiding this comment.
I think these should not be optional. Instead of changing this, can you create a function (in this file) that you call, and which just calls getToolHandler?
| }); | ||
|
|
||
| await logsSynthtraceEsClient.index([logs]); | ||
| await logsSynthtraceEsClient.refresh(); |
There was a problem hiding this comment.
Is this necessary when we already have refreshAfterIndex?
There was a problem hiding this comment.
Thanks, I missed that refreshAfterIndex: true is already configured in the synthtrace client manager.
| errorMessage: ERROR_MESSAGE, | ||
| warningMessage: WARNING_MESSAGE, | ||
| infoMessage: INFO_MESSAGE, |
| logsSynthtraceEsClient, | ||
| logData: { | ||
| traceId, | ||
| serviceName: SERVICE_NAME, |
There was a problem hiding this comment.
This should be a param to the data generator (decided by the consumer) and thus not returned
There was a problem hiding this comment.
... this would also be more consistent with service environment
…lder/server/routes/ai_insights/get_log_ai_insights.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
| id, | ||
| size: 1, | ||
| _source: false, | ||
| fields: ['*'], |
There was a problem hiding this comment.
We shouldn't retrieve all fields - just the fields we need.
…lder/server/utils/warning_and_above_log_filter.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
| `); | ||
|
|
||
| const userPrompt = dedent(` | ||
| ${context} |
There was a problem hiding this comment.
Did you consider adding context to system prompt instead of user prompt? Pros/cons?
There was a problem hiding this comment.
Good question! For me the context is the user's input data to analyze, so userPromt is a right point to add it. Also this part will be available for the chat with agent later as a part of attachments
…lder/server/routes/ai_insights/get_log_ai_insights.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
…lder/server/routes/ai_insights/get_log_ai_insights.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
…lder/server/routes/ai_insights/get_log_document_by_id.ts Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
💔 Build Failed
Failed CI StepsHistory
|
Issue #445
This PR enhances the Log AI Insight:
We implemented a function to identify the log severity. It takes into account both ECS and OTel fields, since we are working with the raw log entry. We check the severity of the log entry, and we only ask the model to include remediation steps when the log severity is warning or higher.
We now always prefetch the ServiceSummary and CorrelatedLogs.
Tested this using the following Cursor prompt prompt_log_ai_insight.md
Here are the results logaiinsightresults.md
It was able to correctly identify errors, and the correlated logs provided enough detail to understand where the error might have occurred.