Skip to content

Restructuring the semantic_text field type page #138571

Merged
kosabogi merged 32 commits intoelastic:mainfrom
kosabogi:restructuring-semantic-text-page
Dec 11, 2025
Merged

Restructuring the semantic_text field type page #138571
kosabogi merged 32 commits intoelastic:mainfrom
kosabogi:restructuring-semantic-text-page

Conversation

@kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Nov 25, 2025

📷 Preview

Semantic text field type

Summary

This PR restructures and refines the semantic_text field type page.

Main changes

1. Content and wording refinements

  • Renamed sections to be more concise and shorter
  • Standardized terminology
  • Added relevant links
  • Simplified language where applicable
  • For how-to content, added stepper syntax where it was applicable (more than 1 step)

2. Content restructuring

Restructured the semantic_text documentation into three focused pages:

  • Main page (semantic-text.md): Converted to an overview page with an introduction explaining what semantic_text is and an overview section linking to the reference and how-to pages.

  • Reference page (semantic-text-reference.md): New dedicated reference page containing parameter descriptions, inference endpoint configurations, chunking behavior, update operations, querying options, and limitations.

  • How-to guides page (semantic-text-how-tos.md): New dedicated how-to page containing procedure descriptions and examples for common tasks, including configuring inference endpoints, pre-chunking content, retrieving embeddings, highlighting fragments, and cross-cluster search.

Our main reasons for splitting the docs this way:

  • Separating reference from how-to content follows documentation best practices
  • It scales better as we add more guides without making the main pages too long
  • Improves readability: people looking up parameters don't have to skip through procedures, and people following guides don't have to read through parameter details

Feedback and suggestions are welcome!

  • What do you think about this new structure? Does splitting the page by content type improve readability for users?
  • Do you have suggestions for moving any content to different pages?
  • Any comments or suggestions on naming, titles, or organization?

Related issue: elastic/docs-content#3836

@kosabogi kosabogi added >docs General docs changes Team:Docs Meta label for docs team labels Nov 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/core-docs (Team:Docs)

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really nice already, I left some comments and ideas, but particularly the how-to ideas might be best left until SMEs have had the time to digest the initial proposal of breaking these pages up, which will have to wait until next week.

I think the landing page feedback is probably actionable right now though if you want to rework that a bit :)

serverless: ga
---

# How-to guides for `semantic_text`
Copy link
Contributor

@leemthompo leemthompo Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level I think we can organize the how-to's into 3 clear categories:

  • Setup and configuration
  • Ingestion
  • Search and retrieval

Maybe they could be subpages, but please don't feel obliged to jump on this immediately and also of course feel free to push back if that sounds like overkill. Here's the overview of what that might look like, and note that it also implies moving a couple of things out of the reference section which might belong together more naturally under how-to. But this might require additional refactoring that isn't worth the ROI immediately. So just take this as food for thought :)

Expand to see what a potential how-to subpages restructuring would look like
# How-to guides for `semantic_text`

## Setup & configuration

### Configure inference endpoints
- Use default and preconfigured endpoints
- Use ELSER on EIS  
- Use a custom inference endpoint
- Use dedicated endpoints for ingestion and search
- **[MOVED FROM REF]** Inference endpoint validation

## Ingestion

### Index pre-chunked content
- Disable automatic chunking
- Index documents

### Use copy_to and multi-fields with semantic_text
- Use copy_to
- Use multi-fields

### **[MOVED FROM REF]** Update documents with semantic_text fields
- Full document updates (examples)
- Partial updates using Bulk API (examples)
- Partial updates using Update API (examples)
- Scripted updates

## Search & retrieval

### **[MOVED FROM REF]** Query semantic_text fields
- Using match queries
- Using kNN queries  
- Using sparse vector queries
- Using semantic queries (legacy)

### Retrieve indexed chunks

### Return semantic_text field embeddings
- Return semantic field embeddings in _source
- Return semantic field embeddings using fields

### Highlight the most relevant fragments
- Highlight semantic_text fields
- Enforce semantic highlighter
- Retrieve fragments in original order

### Perform cross-cluster search (CCS) for semantic_text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this organization idea! I'll wait for SME feedback before implementing. If they agree, I'll reorganize the how-to content accordingly.

@kosabogi kosabogi requested a review from leemthompo November 28, 2025 12:53
Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview page is looking good! I have a few language suggestions, and a few optional nits at this stage :)

@kosabogi kosabogi requested review from a team and removed request for a team December 1, 2025 09:17
@kosabogi
Copy link
Contributor Author

kosabogi commented Dec 3, 2025

I've left some overall technical feedback, but at a higher level it feels like this could be significantly condensed. There's a lot of repetition that I don't know how valuable it is to users. Furthermore things like multi fields - do we really need explicit examples for this because it just should work like any other field? Food for thought, I will defer to @leemthompo 's expertise on flow and layout.

My questions on recommending EIS are a business question, I do not know the official answer.

Thank you for your review! I’ve fixed a few points you highlighted and responded to each comment to clarify where needed.

Regarding the multi-field example: I agree with your point that it may not add much value in its current form, especially without additional context or a specific use case. I’m happy to remove it now and if needed, we can add it back later in a follow-up PR with clearer justification and a concrete use case?

Your technical feedback has been really helpful. Even though this PR is primarily focused on restructuring and not adding/removing content (beyond summaries and intro text), it’s great that you caught these issues so we can improve the content itself as well.

@kosabogi kosabogi requested a review from kderusso December 5, 2025 07:49
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original technical edits have been addressed, thanks.
We should answer the question of recommending EIS before merging.
Deferring approval for structure etc. to the docs team. :)

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a couple minor things :)

It might be good to get another writer to this final review too, but happy to approve once you're ready if necessary

@kosabogi kosabogi requested a review from leemthompo December 9, 2025 10:48
@kosabogi kosabogi requested review from kderusso and maxjakob December 9, 2025 14:10
@kosabogi kosabogi requested a review from maxjakob December 10, 2025 08:53
Copy link
Contributor

@marciw marciw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Small comments


::::::{tab-item} Default ELSER endpoint

If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tab item title says "default," but then the text on the tab says "preconfigured," which is confusing because the other 2 tabs distinguish default and preconfigured... I might just be confused. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, thank you! I changed the wording to "default".

Copy link
Contributor

@maxjakob maxjakob Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an important distinction that we should represent here:

  • Preconfigured endpoints are automatically created by Elasticsearch on startup. These are many in an ES cluster. Examples: .elser-2-elasticsearch, .elser-2-elastic or .jina-embeddings-v3 (about to be released).
  • Some features like semantic_text have default endpoints configured. There is always just one per features. Examples: semantic_text uses .elser-2-elasticsearch, text_similarity_reranker will use .jina-reranker-v2 (soon).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kosabogi pinging here to make sure you saw this. Could be a good follow up to clarify the terms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @maxjakob, thanks a lot for the clarification. I’ve added this as an item to address in a follow-up issue.

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a big ++ to @marciw's suggestions. Thanks for iterating!

@kosabogi kosabogi merged commit 6ed40ee into elastic:main Dec 11, 2025
11 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Dec 11, 2025
* upstream/main: (79 commits)
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/140_pre_filter_search_shards/prefilter on non-indexed date fields} elastic#139381
  Adjust error bounds for bfloat16 value checks (elastic#139371)
  Unmute some vector CSS tests (elastic#139370)
  Do not allow `project_routing` as a query param (elastic#139206)
  Unmute HalfFloat...Tests#testSynthesizeArrayRandom (elastic#139341)
  Remove leniency in LinkedProjectConfig builder methods (elastic#139012)
  EQL: fix project_routing (elastic#139366)
  Add patch version for 9.2 index version constant (elastic#139362)
  Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT test {p0=search.vectors/200_dense_vector_docvalue_fields/dense_vector docvalues with bfloat16} elastic#139368
  ES|QL: Enable CCS tests for FORK (elastic#139302)
  Restructuring the semantic_text field type page  (elastic#138571)
  AggregateMetricDouble fields should not build BKD indexes (elastic#138724)
  Feature/count by trunc with filter (elastic#138765)
  ESQL: Convert TS 500 error to 400 (elastic#139360)
  [CI] Rerun failing tests for periodic build pipelines (elastic#139200)
  revert muting saml test (elastic#139327)
  Add TDigest histogram as metric (elastic#139247)
  Links solved bugs to class cast exception changelog and unmutes errors (elastic#139340)
  Ensure integer sorts are rewritten to long sorts for BWC indexes (elastic#139293)
  Integrate stored fields format bloom filter with synthetic _id (elastic#138515)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes Team:Docs Meta label for docs team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants