elastic · benironside · Jan 4, 2026 · Jan 4, 2026 · Jan 4, 2026 · Jan 4, 2026
@@ -102,7 +102,7 @@ The [Model Context Protocol (MCP)](/solutions/search/mcp.md) lets you connect AI
 * [Partitioning](/solutions/observability/streams/management/partitioning.md): Use AI to suggest logical groupings and child streams based on your data when using wired streams.
 * [Advanced settings](/solutions/observability/streams/management/advanced.md): Use AI to generate a [stream description](/solutions/observability/streams/management/advanced.md#streams-advanced-description) and a [feature identification](/solutions/observability/streams/management/advanced.md#streams-advanced-features) that other AI features, like significant events, use when generating suggestions.
 
-## AI-powered features in {{elastic-sec}}
+## AI-powered features in {{elastic-sec}} [security-features]
 
 {{elastic-sec}}'s AI-powered features all require an [LLM connector](/explore-analyze/ai-features/llm-guides/llm-connectors.md). When you use one of these features, you can select any LLM connector that's configured in your environment. The connector you select for one feature does not affect which connector any other feature uses. For specific configuration instructions, refer to each feature's documentation.
 

@@ -13,37 +13,38 @@
 
 # Large language model performance matrix
 
-This page describes the performance of various large language models (LLMs) for different use cases in {{elastic-sec}}, based on our internal testing. To learn more about these use cases, refer to [Attack discovery](/solutions/security/ai/attack-discovery.md) or [AI Assistant](/solutions/security/ai/ai-assistant.md).
+This page describes the performance of various large language models (LLMs) for different use cases in {{elastic-sec}}, based on our internal testing. To learn more about these use cases, refer to [AI-Powered features](/explore-analyze/ai-features.md#security-features).
 
 ::::{important}
-`Excellent` is the best rating, followed by `Great`, then by `Good`, and finally by `Poor`. Models rated `Excellent` or `Great` should produce quality results. Models rated `Good` or `Poor` are not recommended for that use case.
-::::
+Higher scores indicate better performance. A score of 100 on a task means the model met or exceeded all task-specific benchmarks. 
 
+Models with a score of "Not recommended" failed testing. This could be due to various issues, including context window constraints.
+::::
 
 
 ## Proprietary models [_proprietary_models]
 
 Models from third-party LLM providers.
 
-| **Feature** | - | **Assistant - General** | **Assistant - {{esql}} generation** | **Assistant - Alert questions** | **Assistant - Knowledge retrieval** | **Attack Discovery** | **Automatic Migration** |
-| --- | --- | --- | --- | --- | --- | --- | --- |
-| **Model** | **Claude Opus 4**          | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-|           | **Claude Sonnet 4**        | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-|           | **Claude Sonnet 3.7**      | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-|           | **GPT-4.1**                 | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-|           | **Gemini 2.0 Flash 001**    | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-|           | **Gemini 2.5 Pro**          | Excellent | Excellent | Excellent | Excellent | Excellent | Excellent
-
+| **Model** | **Alerts** | **{{esql}} Query Generation** | **Knowledge Base Retrieval** | **Attack Discovery** | **General Security** | **Automatic Migration** | **Average Score** |
+| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
+| **GPT 5 Chat** | 91 | 92 | 100 | 85 | 92 | 99 | **93** |
+| **Sonnet 4.5** | 90 | 90 | 100 | 80 | 90 | 100 | **92** |
+| **GPT 5.1** | 93 | 95 | 100 | 95 | 65 | 98 | **91** |
+| **Sonnet 3.7** | 89 | 90 | 100 | 70 | 90 | 97 | **89** |
+| **Elastic Managed LLM** | 89 | 90 | 100 | 70 | 90 | 97 | **89** |
+| **Opus 4.5** | 86 | 86 | 100 | 85 | 90 | 73 | **87** |
+| **Gemini 2.5 Pro** | 89 | 86 | 100 | 87 | 90 | 63 | **86** |
+| **Opus 4.1** | 92 | 93 | 100 | 70 | 90 | 70 | **86** |
+| **Sonnet 4** | 89 | 92 | 100 | 70 | 88 | 75 | **86** |
+| **GPT 4.1** | 87 | 88 | 100 | 80 | 88 | 31 | **79** |
+| **Gemini 2.5 Flash** | 87 | 90 | Not recommended | Not recommended | 90 | Not recommended | **45** |
+| **Haiku 4.5** | 84 | 80 | Not recommended | Not recommended | 88 | Not recommended | **42** |
 
 ## Open-source models [_open_source_models]
 
 Models you can [deploy yourself](/explore-analyze/ai-features/llm-guides/local-llms-overview.md).
 
-| **Feature** | - | **Assistant - General** | **Assistant - {{esql}} generation** | **Assistant - Alert questions** | **Assistant - Knowledge retrieval** | **Attack Discovery** | **Automatic Migration**
-| --- | --- | --- | --- | --- | --- | --- |
-| **Model** | **Mistral‑Small‑3.2‑24B‑Instruct‑2506** | Excellent | Good | Excellent | Excellent | Good | N/A
-|           | **Mistral-Small-3.1-24B-Instruct-2503** | Excellent | Good | Excellent | Excellent | Good | N/A
-|           | **Mistral Nemo**   | Good | Good  | Great | Good | Poor | Poor |
-|           | **LLama 3.2**      | Good | Poor  | Good  | Poor | Poor | Good |
-|           | **LLama 3.1 405b** | Good | Great | Good  | Good | Poor | Poor |
-|           | **LLama 3.1 70b**  | Good | Good  | Poor  | Poor | Poor | Good |
+| **Model** | **Alerts** | **{{esql}} Query Generation** | **Knowledge Base Retrieval** | **Attack Discovery** | **General Security** | **Automatic Migration** | **Average Score** |
+| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
+| **GPT OSS 20b** | 82 | 25 | Not recommended | Not recommended | 10 | Not recommended | **20** |