Skip to content

Add PCI compliance skill and tools for Agent Builder#256060

Merged
smriti0321 merged 30 commits into
mainfrom
smriti/pci-compliance-agent
May 6, 2026
Merged

Add PCI compliance skill and tools for Agent Builder#256060
smriti0321 merged 30 commits into
mainfrom
smriti/pci-compliance-agent

Conversation

@smriti0321
Copy link
Copy Markdown

@smriti0321 smriti0321 commented Mar 4, 2026

Update: This branch supersedes the earlier “PCI Compliance Agent + three tools” description. The shipped design is a PCI compliance skill and four tools (including field mapper), with no dedicated PCI BuiltInAgentDefinition.

Summary

This PR adds PCI DSS v4.0.1 support to Security Solution Agent Builder using the skills model instead of a standalone built-in agent.

  • Introduces a pci-compliance skill (defineSkillType) with guided instructions for assessments, reporting, confidence interpretation, deduplication, and time-range behavior.
  • Registers four PCI-specific tools: scope discovery, compliance checks, compliance reporting, and field mapping for non-ECS / custom sources.
  • Wires the skill to platform Agent Builder tools (search, indices, mappings, documents, cases, product docs, generateEsql / executeEsql) plus security alerts and entity risk tools where relevant.
  • Does not add a separate “PCI Compliance Agent” BuiltInAgentDefinition; users enable PCI workflows via the skill (aligned with Agent Builder’s skill-first direction).

Architecture

  • Skill registration: register_skills.ts registers pciComplianceSkill.
  • Tool registration: register_tools.ts registers PCI tools alongside existing security tools.
  • Agents: registerAgents is unchanged for PCI (only the existing threat hunting agent is registered as a built-in agent).

Skill id: pci-compliance.
Skill content and tool allow-list live under:

  • x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/pci_compliance/

PCI tool implementations:

  • tools/pci_scope_discovery_tool.ts — PCI-relevant data coverage across indices
  • tools/pci_compliance_check_tool.ts — requirement-level checks, violations, confidence
  • tools/pci_compliance_report_tool.ts — structured / visual-style compliance reporting
  • tools/pci_field_mapper_tool.ts — suggest ECS mappings for custom fields when ECS coverage is low

Shared requirement / query definitions:

  • tools/pci_compliance_requirements.ts (and related modules as present in this branch)

Supporting wiring (allow-lists, constants, plugin registration) is updated so the new skill and tool IDs are permitted where required.

Relation to the previous PR revision

An earlier revision of this PR proposed a dedicated PCI Compliance Agent (BuiltInAgentDefinition). That approach was replaced by this skill + tools design so PCI is available as a composable skill without adding another first-class agent.

Test plan

  • node scripts/check_changes.ts
  • Jest — PCI tools and skill (e.g. pci_compliance_skill.test.ts, pci_*_tool.test.ts, including pci_field_mapper_tool.test.ts if applicable)

(Add any Elastic CI checkbox steps your team uses for this PR.)

Notes for reviewers

  • Assessments are bounded by data quality, ECS alignment, index selection, and time windows; the skill text documents interpretation (e.g. GREEN/AMBER/RED, NOT_ASSESSABLE, deduplication).
  • This matches the product expectation that “compliance” here is evidence-oriented telemetry and checks over customer data, not a substitute for a full manual PCI audit.

@smriti0321 smriti0321 requested review from a team as code owners March 4, 2026 17:27
@cla-checker-service
Copy link
Copy Markdown

cla-checker-service Bot commented Mar 4, 2026

💚 CLA has been signed

@smriti0321 smriti0321 added backport:skip This PR does not require backporting release_note:enhancement labels Mar 4, 2026
@smriti0321 smriti0321 force-pushed the smriti/pci-compliance-agent branch from 9a8e8b9 to 454056f Compare March 4, 2026 17:41
@smriti0321 smriti0321 self-assigned this Mar 4, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 4, 2026

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Entity Analytics - Security Solution Cypress Tests #1 / Entity Analytics Watchlists Management Page shows table when watchlists are returned shows table when watchlists are returned
  • [job] [logs] FTR Configs #104 / Serverless Common UI - Management Data View Management creating and deleting default data view index pattern deletion "before all" hook for "should return to index pattern list"

Metrics [docs]

✅ unchanged

History

cc @smriti0321

@smriti0321 smriti0321 force-pushed the smriti/pci-compliance-agent branch from 8ae8ba3 to 28147d5 Compare March 4, 2026 20:01
@smriti0321 smriti0321 marked this pull request as draft March 6, 2026 11:52
@JordanSh
Copy link
Copy Markdown
Contributor

JordanSh commented Mar 12, 2026

Currently, these queries do not "prove" compliance. Instead, they act as telemetry health checks to prove that you are collecting data which potentially can be necessary to be audited. For example, I've asked the agent to show me the queries it runs together with the PCI Compliance result:

image

The query it used here just checks that we have event.category == "network" logs, which leads to a false green scenario. If the answer is Yes (matching_events > 0), you have only proven that your logging pipeline is working. This satisfies parts of Requirement 10 (Audit Trails), but it says nothing about the security state of the network.

Furthermore, it did not check an actual PCI requirement like shown in the examples in the PR description which requires much more specific queries:

PCI DSS Req 8.3.6 — Account Lockout After Failed Attempts.
What it checks: PCI DSS requires accounts to be locked after no more than 10 failed login attempts. 
This query finds accounts with more than 6 failed logins (industry best practice threshold), 
indicating that lockout mechanisms may not be enforced.

To that point, there are about 300 requirements in PCI DSS v4.0. If we are going to build an agent that claims to check compliance, we need to shift the implementation from Data Presence to Data Evaluation.

1. Proposed Queries

To provide actual auditor value, we must pivot to queries that look for violations. Below are the recommended ES|QL patterns that we should be using instead. I’ve asked an LLM to create more detailed queries just to see what that looks like:

1.1 Requirement 4: Weak Encryption in Transit

Goal: Identify legacy/weak protocols (SSL/early TLS).

buildEsql: (indexPattern, from, to) => `
  FROM ${indexPattern}
  | WHERE @timestamp >= "${from}" AND @timestamp <= "${to}"
  | WHERE network.protocol == "https" OR tls.version IS NOT NULL
  | WHERE tls.version IN ("1.0", "1.1") OR network.protocol == "http"
  | STATS violations = COUNT(*) by destination.ip, tls.version

1.2 Requirement 8.3: Non-MFA Authentications

Goal: Identify successful logins to the CDE that bypassed MFA.

buildEsql: (indexPattern, from, to) => `
  FROM ${indexPattern}
  | WHERE @timestamp >= "${from}" AND @timestamp <= "${to}"
  | WHERE event.category == "authentication" 
  | WHERE event.outcome == "success"
  | WHERE user.roles : "*admin*" OR data_stream.dataset : "*cde*"
  | WHERE mfa.enabled == false
  | STATS insecure_logins = COUNT(*) by user.name, source.ip

1.3 Requirement 8.3.6: Account Lockout (The "Compliance" Version)

Goal: Check for more than 6 failed logins (threshold before the 10-attempt lockout limit).

buildEsql: (indexPattern, from, to) => `
  FROM ${indexPattern}
  | WHERE @timestamp >= "${from}" AND @timestamp <= "${to}"
  | WHERE event.category == "authentication" AND event.outcome == "failure"
  | STATS failed_attempts = COUNT(*) BY user.name, source.ip
  | WHERE failed_attempts > 6
  | SORT failed_attempts DESC

It’s important to notice that those queries are much more complex, and often look for fields which are not necessarily part of ECS. A challenge here would be to expect clients to inject those fields to begin with in order for us to check them. It’s one thing to query over data coming from known Elastic integrations like Okta, and its a much bigger challenge to query over any data coming from the customer. 

2. The Data Quality & Integrity Problem

Compliance is only as good as the data quality. We need to add a data check before each query to even evaluate properly. If a client's logs don't follow ECS (Elastic Common Schema), our violation queries will return 0, giving a "False Green" because we simply couldn't find the fields.

2.1 Field Fill Rate Check

Before running compliance checks, the agent should run a pre-flight integrity check:

FROM logs-*
| WHERE @timestamp >= NOW() - 24h
| STATS 
    missing_user = COUNTIF(user.name IS NULL),
    missing_outcome = COUNTIF(event.outcome IS NULL)
  BY data_stream.dataset
| EVAL quality = ((COUNT(*) - missing_user) * 100.0) / COUNT(*)
| WHERE quality < 90.0

2.2 Data Duplication & Re-indexing

We also need to consider data duplication. During a re-index process, if the agent counts the same log twice, Requirement 8.3.6 (Lockout) might report a "False Red" because it thinks there were 12 failed attempts when there were only 6.

  • Recommendation: We should provide users with index filters so they can specify exactly which indices to audit, avoiding overlapping patterns. We also need to ensure the agent handles deduplication by event.id where possible.

2.3 Timeframes & Sliding Windows

If a check runs over the last 24h but an attacker hits 4 times at 11:00 PM and 4 times at 1:00 AM the next day, they've exceeded the threshold of 6, but the daily query would miss it.

  • Recommendation: We should allow users to apply custom time frame filters. Instead of a fixed 24h window, we need to implement sliding lookback windows (e.g., 48h or 7d) to catch "low and slow" attacks that bypass shorter query ranges. 

  • A challenge here is that each query can potentially require a different timeframe, I suspect there’s no one size fits all solution her

@smriti0321 smriti0321 force-pushed the smriti/pci-compliance-agent branch from 5ff30ea to fc2ad7d Compare April 13, 2026 14:27
- Introduce pciComplianceSkill registered via register_skills, replacing
  the previous standalone PCI Compliance Agent approach
- Add four PCI-specific tools: pci_scope_discovery, pci_compliance_check,
  pci_compliance_report, and pci_field_mapper
- Wire skill to platform tools (search, indices, mappings, generateEsql,
  executeEsql) plus security alerts and entity risk score tools
- Add compliance directory to SkillsDirectoryStructure
- Register pci-compliance in AGENT_BUILDER_BUILTIN_SKILLS allow list
- Register all PCI tool IDs in AGENT_BUILDER_BUILTIN_TOOLS allow list
- Include unit tests for skill definition and all four PCI tools

Made-with: Cursor
@smriti0321 smriti0321 force-pushed the smriti/pci-compliance-agent branch from fc2ad7d to 1c4badf Compare April 13, 2026 14:32
@smriti0321 smriti0321 changed the title Add PCI compliance agent and tools for Agent Builder Add PCI compliance skill and tools for Agent Builder Apr 14, 2026
…i_field_mapper_tool.t (#263815)

<!-- Macroscope (Fix It For Me) template starts here -->
### Macroscope: _Fix It For Me_
- This PR originated from [this
comment](https://github.com/elastic/kibana/pull/256060/files#r3094004754)
in #256060.
- Since auto-merge is on, Macroscope will merge this PR after waiting
for checks to pass.
- If you'd rather not wait, you can always merge this yourself but **no
further action from you is currently needed**.
- You can also @mention Macroscope in this PR to request further
changes.

#### Activity
Currently: <!-- Macroscope (Fix It For Me) current status starts here
-->Not merged: unstable<!-- Macroscope (Fix It For Me) current status
ends here -->

<details>
<summary>Previously</summary>

<!-- Macroscope (Fix It For Me) previous status starts here -->
- Waiting on checks
- Pushed 034ba1d
<!-- Macroscope (Fix It For Me) previous status ends here -->

</details>

----
<!-- Macroscope (Fix It For Me) template ends here -->

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
smriti0321 pushed a commit that referenced this pull request Apr 16, 2026
…i_field_mapper_tool.t (#263815)

<!-- Macroscope (Fix It For Me) template starts here -->
### Macroscope: _Fix It For Me_
- This PR originated from [this
comment](https://github.com/elastic/kibana/pull/256060/files#r3094004754)
in #256060.
- Since auto-merge is on, Macroscope will merge this PR after waiting
for checks to pass.
- If you'd rather not wait, you can always merge this yourself but **no
further action from you is currently needed**.
- You can also @mention Macroscope in this PR to request further
changes.

#### Activity
Currently: <!-- Macroscope (Fix It For Me) current status starts here
-->Not merged: unstable<!-- Macroscope (Fix It For Me) current status
ends here -->

<details>
<summary>Previously</summary>

<!-- Macroscope (Fix It For Me) previous status starts here -->
- Waiting on checks
- Pushed 034ba1d
<!-- Macroscope (Fix It For Me) previous status ends here -->

</details>

----
<!-- Macroscope (Fix It For Me) template ends here -->

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
@patrykkopycinski patrykkopycinski added ci:cloud-deploy Create or update a Cloud deployment ci:cloud-deploy-elser If set, the ML node in the ES cluster will be deployed with considerations towards the ELSER model labels Apr 27, 2026
@smriti0321 smriti0321 enabled auto-merge (squash) May 6, 2026 08:32
@smriti0321 smriti0321 removed request for JordanSh and niros1 May 6, 2026 08:34
@patrykkopycinski patrykkopycinski removed ci:cloud-deploy Create or update a Cloud deployment ci:cloud-deploy-elser If set, the ML node in the ES cluster will be deployed with considerations towards the ELSER model labels May 6, 2026
@patrykkopycinski
Copy link
Copy Markdown
Contributor

/ci

@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 152.0KB 152.1KB +29.0B
Unknown metric groups

References to deprecated APIs

id before after diff
securitySolution 564 566 +2

Unreferenced deprecated APIs

id before after diff
securitySolution 564 566 +2

History

cc @smriti0321

@smriti0321 smriti0321 merged commit 2e95775 into main May 6, 2026
32 checks passed
@smriti0321 smriti0321 deleted the smriti/pci-compliance-agent branch May 6, 2026 15:43
ersin-erdal pushed a commit to ersin-erdal/kibana that referenced this pull request May 6, 2026
**Update:** This branch supersedes the earlier “PCI Compliance Agent +
three tools” description. The shipped design is a **PCI compliance
skill** and **four tools** (including field mapper), with **no**
dedicated PCI `BuiltInAgentDefinition`.

## Summary

This PR adds **PCI DSS v4.0.1** support to **Security Solution Agent
Builder** using the **skills** model instead of a standalone built-in
agent.

- Introduces a **`pci-compliance`** skill (`defineSkillType`) with
guided instructions for assessments, reporting, confidence
interpretation, deduplication, and time-range behavior.
- Registers **four PCI-specific tools**: scope discovery, compliance
checks, compliance reporting, and **field mapping** for non-ECS / custom
sources.
- Wires the skill to **platform Agent Builder tools** (search, indices,
mappings, documents, cases, product docs, **generateEsql** /
**executeEsql**) plus **security alerts** and **entity risk** tools
where relevant.
- **Does not** add a separate “PCI Compliance Agent”
`BuiltInAgentDefinition`; users enable PCI workflows via the **skill**
(aligned with Agent Builder’s skill-first direction).

### Architecture

- **Skill registration:** `register_skills.ts` registers
`pciComplianceSkill`.
- **Tool registration:** `register_tools.ts` registers PCI tools
alongside existing security tools.
- **Agents:** `registerAgents` is unchanged for PCI (only the existing
threat hunting agent is registered as a built-in agent).

Skill id: **`pci-compliance`**.  
Skill content and tool allow-list live under:

-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/pci_compliance/`

PCI tool implementations:

- `tools/pci_scope_discovery_tool.ts` — PCI-relevant data coverage
across indices
- `tools/pci_compliance_check_tool.ts` — requirement-level checks,
violations, confidence
- `tools/pci_compliance_report_tool.ts` — structured / visual-style
compliance reporting
- `tools/pci_field_mapper_tool.ts` — suggest ECS mappings for custom
fields when ECS coverage is low

Shared requirement / query definitions:

- `tools/pci_compliance_requirements.ts` (and related modules as present
in this branch)

Supporting wiring (allow-lists, constants, plugin registration) is
updated so the new skill and tool IDs are permitted where required.

### Relation to the previous PR revision

An earlier revision of this PR proposed a **dedicated PCI Compliance
Agent** (`BuiltInAgentDefinition`). That approach was **replaced** by
this **skill + tools** design so PCI is available as a **composable
skill** without adding another first-class agent.

### Test plan

- [ ] `node scripts/check_changes.ts`
- [ ] Jest — PCI tools and skill (e.g. `pci_compliance_skill.test.ts`,
`pci_*_tool.test.ts`, including `pci_field_mapper_tool.test.ts` if
applicable)

(Add any Elastic CI checkbox steps your team uses for this PR.)

### Notes for reviewers

- Assessments are bounded by **data quality**, **ECS alignment**,
**index selection**, and **time windows**; the skill text documents
interpretation (e.g. GREEN/AMBER/RED, `NOT_ASSESSABLE`, deduplication).
- This matches the product expectation that “compliance” here is
**evidence-oriented telemetry and checks** over customer data, not a
substitute for a full manual PCI audit.

---------

Co-authored-by: macroscopeapp[bot] <170038800+macroscopeapp[bot]@users.noreply.github.com>
Co-authored-by: Patryk Kopyciński <contact@patrykkopycinski.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Patryk Kopycinski <patryk.kopycinski@elastic.co>
patrykkopycinski added a commit that referenced this pull request May 7, 2026
…mlinks, broken eval JSON, PCI flag) (#268146)

## Summary

Follow-up cleanup for #256060 (Add PCI compliance skill and tools for
Agent Builder). That PR accidentally committed two personal-machine
symlinks and merged the new eval-suite entry in a way that produced
invalid JSON. This PR fixes those regressions and enables the PCI
compliance skill by default.

## What this PR does

1. **Removes `elastic-llm-benchmarker` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/automaker/elastic-llm-benchmarker` —
an absolute path that only exists on a single contributor's machine, and
a directory that is not part of this repo. Nothing tracked in the tree
references it (`git grep elastic-llm-benchmarker` is empty).

2. **Removes `openspec/specs` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/kibana/openspec/specs` — a
self-referential absolute path (the symlink itself). Same story: no
tracked file references it.

3. **Fixes invalid JSON in
`.buildkite/pipelines/evals/evals.suites.json`.** The new
`pci-compliance` suite entry from #256060 is missing the closing `},`
before the next entry, so the file is not valid JSON as merged.
Reproducible with `python3 -m json.tool <
.buildkite/pipelines/evals/evals.suites.json`. After this PR the file
parses and contains 15 well-formed suites.

4. **Flips `pciComplianceAgentBuilder` from `false` to `true`** in
`x-pack/solutions/security/plugins/security_solution/common/experimental_features.ts`.
The flag is still respected by `register_skills.ts` (gates
`pciComplianceSkill`) and `register_tools.ts` (gates the four PCI
tools), so any environment can still opt out by setting it back to
`false` via `xpack.securitySolution.enableExperimental`.

## Test plan

- [ ] `node scripts/check_changes.ts`
- [ ] CI loads `.buildkite/pipelines/evals/evals.suites.json` without
parse error.
- [ ] PCI compliance skill and tools register at startup with default
config (no `enableExperimental` overrides).
- [ ] Existing PCI-compliance Jest tests continue to pass:
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/pci_compliance/pci_compliance_skill.test.ts`
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/pci_*_tool.test.ts`

## Notes for reviewers

- The two deleted symlinks were committed as gitlinks of mode `120000`
and contained absolute paths from a single developer's laptop. They are
dead weight in every other clone and should never have been tracked.
- I confirmed neither `tsconfig.base.json`, `package.json`,
`.github/CODEOWNERS`, nor any other tracked file references either
symlink path, so removing them is risk-free.
romulets pushed a commit to romulets/kibana that referenced this pull request May 8, 2026
…ups (symlinks, broken eval JSON, PCI flag) (elastic#268146)

## Summary

Follow-up cleanup for elastic#256060 (Add PCI compliance skill and tools for
Agent Builder). That PR accidentally committed two personal-machine
symlinks and merged the new eval-suite entry in a way that produced
invalid JSON. This PR fixes those regressions and enables the PCI
compliance skill by default.

## What this PR does

1. **Removes `elastic-llm-benchmarker` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/automaker/elastic-llm-benchmarker` —
an absolute path that only exists on a single contributor's machine, and
a directory that is not part of this repo. Nothing tracked in the tree
references it (`git grep elastic-llm-benchmarker` is empty).

2. **Removes `openspec/specs` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/kibana/openspec/specs` — a
self-referential absolute path (the symlink itself). Same story: no
tracked file references it.

3. **Fixes invalid JSON in
`.buildkite/pipelines/evals/evals.suites.json`.** The new
`pci-compliance` suite entry from elastic#256060 is missing the closing `},`
before the next entry, so the file is not valid JSON as merged.
Reproducible with `python3 -m json.tool <
.buildkite/pipelines/evals/evals.suites.json`. After this PR the file
parses and contains 15 well-formed suites.

4. **Flips `pciComplianceAgentBuilder` from `false` to `true`** in
`x-pack/solutions/security/plugins/security_solution/common/experimental_features.ts`.
The flag is still respected by `register_skills.ts` (gates
`pciComplianceSkill`) and `register_tools.ts` (gates the four PCI
tools), so any environment can still opt out by setting it back to
`false` via `xpack.securitySolution.enableExperimental`.

## Test plan

- [ ] `node scripts/check_changes.ts`
- [ ] CI loads `.buildkite/pipelines/evals/evals.suites.json` without
parse error.
- [ ] PCI compliance skill and tools register at startup with default
config (no `enableExperimental` overrides).
- [ ] Existing PCI-compliance Jest tests continue to pass:
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/pci_compliance/pci_compliance_skill.test.ts`
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/pci_*_tool.test.ts`

## Notes for reviewers

- The two deleted symlinks were committed as gitlinks of mode `120000`
and contained absolute paths from a single developer's laptop. They are
dead weight in every other clone and should never have been tracked.
- I confirmed neither `tsconfig.base.json`, `package.json`,
`.github/CODEOWNERS`, nor any other tracked file references either
symlink path, so removing them is risk-free.
patrykkopycinski added a commit to patrykkopycinski/kibana that referenced this pull request May 11, 2026
…arison on real connectors

The autonomous-vs-handwritten PCI comparison previously ran on llama3.1:8b
through a local Ollama proxy. At that model scale the agent router never
engaged either PCI skill, so every scenario scored 0.00 and the comparison
landed on the floor (see commit fc5194e). This commit promotes the
comparison to real Bedrock connectors and ships the connector-side fix that
the upgrade required.

Bedrock connector — Claude Opus 4.7 enablement
----------------------------------------------
Claude Opus 4.7 on Bedrock rejects the `temperature` inference parameter
with `temperature is deprecated for this model`. Without omitting it the
connector simply 400s on every request. Fix is in three layers:

  - `@kbn/inference-common`: new `supportsTemperature?: boolean` on
    `ModelDefinition`; `claude-opus-4-7` marked `supportsTemperature: false`.
    Future Claude variants (or other provider models) with the same
    restriction need only flip the flag — one source of truth.

  - `inference` plugin: `getTemperatureIfValid` omits temperature when the
    model definition declares `supportsTemperature: false`. Sits alongside
    the existing OpenAI o-series exclusions and works for any provider.

  - `stack_connectors` (Bedrock): new local
    `bedrockModelSupportsTemperature(model)` helper; `formatBedrockBody`
    threads `model` through and gates the parameter. `invokeAI`,
    `invokeStream`, `invokeAIRaw`, `_converse`, and `_converseStream` all
    consult it. Defense in depth — direct sub-action callers
    (Security AI Assistant, etc.) are protected without taking a
    cross-plugin dependency on `@kbn/inference-common`.

Smoke-tested with `invokeAI` + `converse` sub-actions:
  - Claude 4.7 Opus (`us.anthropic.claude-opus-4-7`): now passes — temperature
    omitted, response returned.
  - Claude 4.6 Sonnet (`us.anthropic.claude-sonnet-4-6`): still passes —
    temperature included as before.

Live eval comparison (PCI Criteria, LLM-judge 0..1)
---------------------------------------------------
Both PCI skill variants ran the same 8-scenario `@kbn/evals-suite-pci-compliance`
suite end-to-end against a real Scout cluster, on two production Bedrock
connectors:

  | Variant     | Claude 4.7 Opus | Claude 4.6 Sonnet |
  |-------------|----------------:|------------------:|
  | Handwritten |           0.977 |             0.989 |
  | Autonomous  |           0.834 |             0.860 |

The handwritten skill (Smriti, PR elastic#256060) outperforms the autonomous variant
on both models by 14-15 points. The autonomous architect's broader domain
framing (SAQ taxonomy, v3→v4 deltas, scope-reduction levers) did not
translate into a better PCI-Criteria score. The handwritten contract is
shorter (~4.1k vs ~8.1k chars) and lines up more tightly with the eval's
scoring rubric — that tight coupling is the deciding factor.

build_comparison_html.mjs gains a `--runs <label>=<dir>,...` mode so the
4-cell grid renders from the four results.json snapshots. Legacy
`--handwritten`/`--autonomous` mode still works for single-model runs.

kbn-scout
---------
`run_kibana_server.ts` now respects `SCOUT_READ_DEV_CONFIG=true` and drops
`--no-dev-config` when set, so a developer can load `config/kibana.dev.yml`
(and the preconfigured AI connectors it defines) into the Scout-managed
Kibana process. Default behaviour is unchanged. Without this, evals against
real cloud connectors require fragile API-driven connector creation per
boot.

Refs: #11
js-jankisalvi pushed a commit to js-jankisalvi/kibana that referenced this pull request May 12, 2026
…ups (symlinks, broken eval JSON, PCI flag) (elastic#268146)

## Summary

Follow-up cleanup for elastic#256060 (Add PCI compliance skill and tools for
Agent Builder). That PR accidentally committed two personal-machine
symlinks and merged the new eval-suite entry in a way that produced
invalid JSON. This PR fixes those regressions and enables the PCI
compliance skill by default.

## What this PR does

1. **Removes `elastic-llm-benchmarker` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/automaker/elastic-llm-benchmarker` —
an absolute path that only exists on a single contributor's machine, and
a directory that is not part of this repo. Nothing tracked in the tree
references it (`git grep elastic-llm-benchmarker` is empty).

2. **Removes `openspec/specs` symlink.** Pointed at
`/Users/patrykkopycinski/Projects/kibana/openspec/specs` — a
self-referential absolute path (the symlink itself). Same story: no
tracked file references it.

3. **Fixes invalid JSON in
`.buildkite/pipelines/evals/evals.suites.json`.** The new
`pci-compliance` suite entry from elastic#256060 is missing the closing `},`
before the next entry, so the file is not valid JSON as merged.
Reproducible with `python3 -m json.tool <
.buildkite/pipelines/evals/evals.suites.json`. After this PR the file
parses and contains 15 well-formed suites.

4. **Flips `pciComplianceAgentBuilder` from `false` to `true`** in
`x-pack/solutions/security/plugins/security_solution/common/experimental_features.ts`.
The flag is still respected by `register_skills.ts` (gates
`pciComplianceSkill`) and `register_tools.ts` (gates the four PCI
tools), so any environment can still opt out by setting it back to
`false` via `xpack.securitySolution.enableExperimental`.

## Test plan

- [ ] `node scripts/check_changes.ts`
- [ ] CI loads `.buildkite/pipelines/evals/evals.suites.json` without
parse error.
- [ ] PCI compliance skill and tools register at startup with default
config (no `enableExperimental` overrides).
- [ ] Existing PCI-compliance Jest tests continue to pass:
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/skills/pci_compliance/pci_compliance_skill.test.ts`
-
`x-pack/solutions/security/plugins/security_solution/server/agent_builder/tools/pci_*_tool.test.ts`

## Notes for reviewers

- The two deleted symlinks were committed as gitlinks of mode `120000`
and contained absolute paths from a single developer's laptop. They are
dead weight in every other clone and should never have been tracked.
- I confirmed neither `tsconfig.base.json`, `package.json`,
`.github/CODEOWNERS`, nor any other tracked file references either
symlink path, so removing them is risk-free.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:enhancement v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants