Skip to content

[Agent Builder] SML schema updates#266573

Merged
ppisljar merged 3 commits into
elastic:mainfrom
ppisljar:agentbuilder/sml_schema
May 7, 2026
Merged

[Agent Builder] SML schema updates#266573
ppisljar merged 3 commits into
elastic:mainfrom
ppisljar:agentbuilder/sml_schema

Conversation

@ppisljar
Copy link
Copy Markdown
Contributor

@ppisljar ppisljar commented Apr 30, 2026

Summary

resolves https://github.com/elastic/search-team/issues/14362

SML schema updates

  • makes content semantic text
  • adds description semantic text
  • adds user_id string
  • adds references strng[]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

@github-actions
Copy link
Copy Markdown
Contributor

Vale Linting Results

Summary: 2 warnings, 1 suggestion found

⚠️ Warnings (2)
File Line Rule Message
docs/extend/plugin-list.md 116 Elastic.Latinisms Latin terms and abbreviations are a common source of confusion. Use 'and so on' instead of 'etc'.
docs/extend/plugin-list.md 116 Elastic.Latinisms Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'.
💡 Suggestions (1)
File Line Rule Message
docs/extend/plugin-list.md 119 Elastic.Wordiness Consider using 'all' instead of 'all of '.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@ppisljar ppisljar force-pushed the agentbuilder/sml_schema branch from 494d877 to 8114028 Compare May 4, 2026 12:27
@ppisljar ppisljar marked this pull request as ready for review May 6, 2026 12:32
@ppisljar ppisljar requested a review from a team as a code owner May 6, 2026 12:32
@ppisljar ppisljar added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:agent-builder v9.5.0 labels May 6, 2026
Copy link
Copy Markdown
Contributor

@Apmats Apmats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, LGTM as a checkpoint to iterate from if need be!

@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout Lane #20 - serverless-observability_complete / default / local-serverless-observability_complete - Serverless Observability Navigation - Complete tier body - clicking body nav items sets the active link, updates breadcrumbs, and navigates
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - APM integration not installed but setup completed - Admin user
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - Collector integration is not installed - collector integration missing
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - Collector integration is not installed - Symbolizer integration is not installed
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - Observability Landing Page (discover.isEsqlDefault enabled) - redirects to onboarding when no logs data exists
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - Profiling is setup and data is loaded - Admin user
  • [job] [logs] Scout Lane #7 - stateful-classic / default / local-stateful-classic - Profiling is setup and data is loaded - Viewer user

Metrics [docs]

Unknown metric groups

API count

id before after diff
agentBuilder 114 117 +3
agentContextLayer 102 108 +6
total +9

History

@ppisljar ppisljar merged commit c89bc41 into elastic:main May 7, 2026
31 checks passed
Apmats added a commit to Apmats/kibana that referenced this pull request May 8, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- origin (keyword, URI form `{type}://{id}`, auto-populated by indexer)
- tags (keyword[], free-form)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.
- Indexer auto-populates origin = `{attachmentType}://{originId}` on every
  record alongside origin_id.

Kept for back-compat:
- origin_id retained alongside origin until the connector_ids filter
  (elastic#267333) and the search HTTP response migrate to origin.

Type writers need no changes: they keep returning the same SmlChunk
shape; the indexer constructs origin and ES copy_to handles the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 8, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- origin (keyword, URI form `{type}://{id}`, auto-populated by indexer)
- tags (keyword[], free-form)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.
- Indexer auto-populates origin = `{attachmentType}://{originId}` on every
  record alongside origin_id.

Kept for back-compat:
- origin_id retained alongside origin until the connector_ids filter
  (elastic#267333) and the search HTTP response migrate to origin.

Type writers need no changes: they keep returning the same SmlChunk
shape; the indexer constructs origin and ES copy_to handles the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 8, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- origin (keyword, URI form `{type}://{id}`, auto-populated by indexer)
- tags (keyword[], free-form)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.
- Indexer auto-populates origin = `{attachmentType}://{originId}` on every
  record alongside origin_id.

Kept for back-compat:
- origin_id retained alongside origin until the connector_ids filter
  (elastic#267333) and the search HTTP response migrate to origin.

Type writers need no changes: they keep returning the same SmlChunk
shape; the indexer constructs origin and ES copy_to handles the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 8, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- origin (keyword, URI form `{type}://{id}`, auto-populated by indexer)
- tags (keyword[], free-form)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.
- Indexer auto-populates origin = `{attachmentType}://{originId}` on every
  record alongside origin_id.

Kept for back-compat:
- origin_id retained alongside origin until the connector_ids filter
  (elastic#267333) and the search HTTP response migrate to origin.

Type writers need no changes: they keep returning the same SmlChunk
shape; the indexer constructs origin and ES copy_to handles the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 8, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- tags (keyword[], free-form labels)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.

origin_id is unchanged. An earlier draft of this PR added a parallel
`origin` URI field per the gist's vision but it was dropped: the URI
form is computable on the fly from type + origin_id, Sean's merged
buildTypeFilters targets origin_id directly, and adding a parallel
representation would just be redundant.

Type writers need no changes: they keep returning the same SmlChunk
shape; ES copy_to handles the fan-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 9, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- tags (keyword[], free-form labels)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.

origin_id is unchanged. An earlier draft of this PR added a parallel
`origin` URI field per the gist's vision but it was dropped: the URI
form is computable on the fly from type + origin_id, Sean's merged
buildTypeFilters targets origin_id directly, and adding a parallel
representation would just be redundant.

Type writers need no changes: they keep returning the same SmlChunk
shape; ES copy_to handles the fan-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 9, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- tags (keyword[], free-form labels)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.

origin_id is unchanged. An earlier draft of this PR added a parallel
`origin` URI field per the gist's vision but it was dropped: the URI
form is computable on the fly from type + origin_id, Sean's merged
buildTypeFilters targets origin_id directly, and adding a parallel
representation would just be redundant.

Type writers need no changes: they keep returning the same SmlChunk
shape; ES copy_to handles the fan-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apmats added a commit to Apmats/kibana that referenced this pull request May 11, 2026
Builds on Peter's merged elastic#266573 by adding the schema fields the team
converged on and refactoring title/description/content for BM25 + a
single unified vector retrieval surface.

Fields added:
- tags (keyword[], free-form labels)
- payload (flattened, type-specific opaque data)
- title_autocomplete (search_as_you_type, copy target of title)
- unified_semantic (semantic_text, copy target of title/description/content)

Behavior changes:
- title becomes `text` with copy_to fanning into title_autocomplete and
  unified_semantic. Three retrieval modes (BM25/lexical, prefix/typeahead,
  semantic) from one producer-set field.
- description and content become `text` with copy_to: 'unified_semantic'.
  One inference pass per record instead of three; recall doesn't
  fragment across overlapping content.
- buildSmlSearchQuery: SAYT field paths move from title.* to
  title_autocomplete.*; the should-block uses match: { unified_semantic }
  in place of separate matches on content and description.

origin_id is unchanged. An earlier draft of this PR added a parallel
`origin` URI field per the gist's vision but it was dropped: the URI
form is computable on the fly from type + origin_id, Sean's merged
buildTypeFilters targets origin_id directly, and adding a parallel
representation would just be redundant.

Type writers need no changes: they keep returning the same SmlChunk
shape; ES copy_to handles the fan-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:agent-builder v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants