Restructuring the semantic_text field type page #138571
Restructuring the semantic_text field type page #138571kosabogi merged 32 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/core-docs (Team:Docs) |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
There was a problem hiding this comment.
This is looking really nice already, I left some comments and ideas, but particularly the how-to ideas might be best left until SMEs have had the time to digest the initial proposal of breaking these pages up, which will have to wait until next week.
I think the landing page feedback is probably actionable right now though if you want to rework that a bit :)
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
| serverless: ga | ||
| --- | ||
|
|
||
| # How-to guides for `semantic_text` |
There was a problem hiding this comment.
High-level I think we can organize the how-to's into 3 clear categories:
- Setup and configuration
- Ingestion
- Search and retrieval
Maybe they could be subpages, but please don't feel obliged to jump on this immediately and also of course feel free to push back if that sounds like overkill. Here's the overview of what that might look like, and note that it also implies moving a couple of things out of the reference section which might belong together more naturally under how-to. But this might require additional refactoring that isn't worth the ROI immediately. So just take this as food for thought :)
Expand to see what a potential how-to subpages restructuring would look like
# How-to guides for `semantic_text`
## Setup & configuration
### Configure inference endpoints
- Use default and preconfigured endpoints
- Use ELSER on EIS
- Use a custom inference endpoint
- Use dedicated endpoints for ingestion and search
- **[MOVED FROM REF]** Inference endpoint validation
## Ingestion
### Index pre-chunked content
- Disable automatic chunking
- Index documents
### Use copy_to and multi-fields with semantic_text
- Use copy_to
- Use multi-fields
### **[MOVED FROM REF]** Update documents with semantic_text fields
- Full document updates (examples)
- Partial updates using Bulk API (examples)
- Partial updates using Update API (examples)
- Scripted updates
## Search & retrieval
### **[MOVED FROM REF]** Query semantic_text fields
- Using match queries
- Using kNN queries
- Using sparse vector queries
- Using semantic queries (legacy)
### Retrieve indexed chunks
### Return semantic_text field embeddings
- Return semantic field embeddings in _source
- Return semantic field embeddings using fields
### Highlight the most relevant fragments
- Highlight semantic_text fields
- Enforce semantic highlighter
- Retrieve fragments in original order
### Perform cross-cluster search (CCS) for semantic_text
There was a problem hiding this comment.
I like this organization idea! I'll wait for SME feedback before implementing. If they agree, I'll reorganize the how-to content accordingly.
leemthompo
left a comment
There was a problem hiding this comment.
Overview page is looking good! I have a few language suggestions, and a few optional nits at this stage :)
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Thank you for your review! I’ve fixed a few points you highlighted and responded to each comment to clarify where needed. Regarding the multi-field example: I agree with your point that it may not add much value in its current form, especially without additional context or a specific use case. I’m happy to remove it now and if needed, we can add it back later in a follow-up PR with clearer justification and a concrete use case? Your technical feedback has been really helpful. Even though this PR is primarily focused on restructuring and not adding/removing content (beyond summaries and intro text), it’s great that you caught these issues so we can improve the content itself as well. |
kderusso
left a comment
There was a problem hiding this comment.
My original technical edits have been addressed, thanks.
We should answer the question of recommending EIS before merging.
Deferring approval for structure etc. to the docs team. :)
docs/reference/elasticsearch/mapping-reference/semantic-text.md
Outdated
Show resolved
Hide resolved
leemthompo
left a comment
There was a problem hiding this comment.
Just noticed a couple minor things :)
It might be good to get another writer to this final review too, but happy to approve once you're ready if necessary
docs/reference/elasticsearch/mapping-reference/semantic-text-how-tos.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text-how-tos.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text-how-tos.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text-how-tos.md
Outdated
Show resolved
Hide resolved
…ow-tos.md Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…ow-tos.md Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…ow-tos.md Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…ow-tos.md Co-authored-by: Liam Thompson <leemthompo@gmail.com>
docs/reference/elasticsearch/mapping-reference/semantic-text-reference.md
Outdated
Show resolved
Hide resolved
marciw
left a comment
There was a problem hiding this comment.
Looking good! Small comments
docs/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md
Outdated
Show resolved
Hide resolved
|
|
||
| ::::::{tab-item} Default ELSER endpoint | ||
|
|
||
| If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request: |
There was a problem hiding this comment.
The tab item title says "default," but then the text on the tab says "preconfigured," which is confusing because the other 2 tabs distinguish default and preconfigured... I might just be confused. :)
There was a problem hiding this comment.
Great point, thank you! I changed the wording to "default".
There was a problem hiding this comment.
I think there is an important distinction that we should represent here:
- Preconfigured endpoints are automatically created by Elasticsearch on startup. These are many in an ES cluster. Examples:
.elser-2-elasticsearch,.elser-2-elasticor.jina-embeddings-v3(about to be released). - Some features like
semantic_texthave default endpoints configured. There is always just one per features. Examples:semantic_textuses.elser-2-elasticsearch,text_similarity_rerankerwill use.jina-reranker-v2(soon).
There was a problem hiding this comment.
@kosabogi pinging here to make sure you saw this. Could be a good follow up to clarify the terms
There was a problem hiding this comment.
Hey @maxjakob, thanks a lot for the clarification. I’ve added this as an item to address in a follow-up issue.
docs/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md
Outdated
Show resolved
Hide resolved
docs/reference/elasticsearch/mapping-reference/semantic-text-ingestions.md
Outdated
Show resolved
Hide resolved
leemthompo
left a comment
There was a problem hiding this comment.
LGTM with a big ++ to @marciw's suggestions. Thanks for iterating!
* upstream/main: (79 commits)
Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/140_pre_filter_search_shards/prefilter on non-indexed date fields} elastic#139381
Adjust error bounds for bfloat16 value checks (elastic#139371)
Unmute some vector CSS tests (elastic#139370)
Do not allow `project_routing` as a query param (elastic#139206)
Unmute HalfFloat...Tests#testSynthesizeArrayRandom (elastic#139341)
Remove leniency in LinkedProjectConfig builder methods (elastic#139012)
EQL: fix project_routing (elastic#139366)
Add patch version for 9.2 index version constant (elastic#139362)
Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT test {p0=search.vectors/200_dense_vector_docvalue_fields/dense_vector docvalues with bfloat16} elastic#139368
ES|QL: Enable CCS tests for FORK (elastic#139302)
Restructuring the semantic_text field type page (elastic#138571)
AggregateMetricDouble fields should not build BKD indexes (elastic#138724)
Feature/count by trunc with filter (elastic#138765)
ESQL: Convert TS 500 error to 400 (elastic#139360)
[CI] Rerun failing tests for periodic build pipelines (elastic#139200)
revert muting saml test (elastic#139327)
Add TDigest histogram as metric (elastic#139247)
Links solved bugs to class cast exception changelog and unmutes errors (elastic#139340)
Ensure integer sorts are rewritten to long sorts for BWC indexes (elastic#139293)
Integrate stored fields format bloom filter with synthetic _id (elastic#138515)
...
📷 Preview
Semantic text field type
semantic_textfield type referencesemantic_textSummary
This PR restructures and refines the
semantic_textfield type page.Main changes
1. Content and wording refinements
2. Content restructuring
Restructured the
semantic_textdocumentation into three focused pages:Main page (
semantic-text.md): Converted to an overview page with an introduction explaining whatsemantic_textis and an overview section linking to the reference and how-to pages.Reference page (
semantic-text-reference.md): New dedicated reference page containing parameter descriptions, inference endpoint configurations, chunking behavior, update operations, querying options, and limitations.How-to guides page (
semantic-text-how-tos.md): New dedicated how-to page containing procedure descriptions and examples for common tasks, including configuring inference endpoints, pre-chunking content, retrieving embeddings, highlighting fragments, and cross-cluster search.Our main reasons for splitting the docs this way:
Feedback and suggestions are welcome!
Related issue: elastic/docs-content#3836