[ES|QL] TEXT_EMBEDDING function definition by afoucret · Pull Request #135059 · elastic/elasticsearch

afoucret · 2025-09-19T07:36:56Z

This PR introduces a new TEXT_EMBEDDING function to ES|QL, enabling users to generate dense vector embeddings for text directly within their queries.

Examples

Example 1: Generating embeddings

FROM my_index
| EVAL embedding = TEXT_EMBEDDING("my text to embed", "my_embedding_model")

Example 2: Semantic search with KNN

FROM my_index
| WHERE KNN(my_vector_field, TEXT_EMBEDDING("my search query", "my_embedding_model"))

This work is part of #131079

…nction

elasticsearchmachine · 2025-09-19T07:37:21Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

…n fron inference functions.

github-actions · 2025-09-19T08:27:41Z

🔍 Preview links for changed docs

docs/reference/query-languages/esql/kibana/docs/functions/text_embedding.md

github-actions · 2025-09-19T08:27:42Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

carlosdelest

LGTM!

Some minor nits about a common interface and some missing capability checks.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/inference/TextEmbedding.java

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java

Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>

…esql_text_embedding_function_definition

ioanatia · 2025-09-19T14:38:45Z

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/inference/TextEmbedding.java

+        @Param(
+            name = InferenceFunction.INFERENCE_ID_PARAMETER_NAME,
+            type = { "keyword" },
+            description = "Identifier of the inference endpoint"


we need to add a function example and then regen the docs
otherwise Kibana CI will fail when we try to bring the json specification of this function to Kibana even if it's under snapshot.
this happened recently for decay too #134705 - opened a separate PR to address this #135094 since I promised to look into it.

I think this should be fixed in Kibana CI so it does not break in such a case.

I mean, it is expected for a function that have appliesTo set to FunctionAppliesToLifecycle.DEVELOPMENT to be incomplete. Especially examples come at the very last when writing CSV tests.

Anyway, I added an placeholder example so I will not break anything, It will be replaced when adding more realistic CSV tests.

...plugin/esql/src/test/java/org/elasticsearch/xpack/esql/inference/InferenceResolverTests.java

ioanatia · 2025-09-19T15:11:46Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceResolver.java

+        plan.forEachExpressionUp(UnresolvedFunction.class, f -> {
+            String functionName = snapshotRegistry.resolveAlias(f.name());
+            if (snapshotRegistry.functionExists(functionName)) {
+                FunctionDefinition def = snapshotRegistry.resolveFunction(functionName);


usually this is done at the analyzer level - it's a bit odd to do the function resolution here, but I don't see another simpler option for now 🤷‍♀️

In fact, there is other case where this kind of analysis is done outside of the analyzer (enrich policy or indices resolution).

Because the analyzer is purely synchronous and inference resolution has to be done async, it is necessary to do it this way.

ioanatia · 2025-09-19T15:14:51Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceResolver.java

        return BytesRefs.toString(e.fold(FoldContext.small()));
    }

+    public String inferenceId(UnresolvedFunction f, FunctionDefinition def) {


if this method is actually needed, should this be private? maybe also add a javadoc to explain what it does?

Added some comments and improved the code readability.

ioanatia · 2025-09-19T15:18:54Z

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/inference/TextEmbedding.java

+    }
+
+    @Override
+    public void writeTo(StreamOutput out) throws IOException {


I don't think we need to serialize this function - since we will always resolve it on the coordinator and replace it with its result.
We have other instances in ES|QL where we don't serialize - FORK, ROW, RENAME come to mind:
But this would probably be the first function that does not require serialization.

I would prefer to keep the serialization even if the expression is not supposed to be moved between node.

The first reason is because the Function are supposed to implement it. If for some reason the infrastructure of execution need to move them between node in the future, I would not like to be the black sheep that make things more difficult.

Also TextEmbeddingTests extends AbstractFunctionTestCase which expect the function to be serializable.

I consider that the implementation or serialization / deserialization is simple to implement and it is probably not beneficial to let it unimplemented.

The first reason is because the Function are supposed to implement it.

That does not mean it should. If the text_embedding function is actually serialized and sent between nodes, that's an execution path that should never happen.

Anything that extends from LogicalPlan is also supposed to implement serialization, but as I said before there are many examples of plans where we don't serialize.
We could have made the same argument for logical plans that the serialization is simple to implement, or that it's not beneficial to let it unimplemented.
However raising an exception in the case where we attempt to serialize these plans is intentional and it captures very clearly the fact that this should never happen.

I get that making this unserializable is more work, especially for AbstractFunctionTestCase, but at least then the behaviour would be correct.

It feels to me that we are removing some code that we will ultimately reintroduce when we will support non constant input text embeddings. Anyway, I pushed a version without serialization so we can move forward and merge this PR.

ioanatia · 2025-09-19T15:27:24Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceResolver.java

+            if (snapshotRegistry.functionExists(functionName)) {
+                FunctionDefinition def = snapshotRegistry.resolveFunction(functionName);
+                if (InferenceFunction.class.isAssignableFrom(def.clazz())) {
+                    String inferenceId = inferenceId(f, def);


InferenceFunction has an inferenceId method already - I wonder if we can just use that, instead of trying to see which function param is called INFERENCE_ID_PARAMETER_NAME.

My problem is that f is not an InferenceFunction but is an UnresoivedFunction.
The only way to access the inference id is to read the argument that have the right name.
By convention I supposed that it will always be called inference_id for all the InferenceFunction we will implement in the future.

To me, the fact that we are looking for the argument that is called inference_id leaks an internal implementation detail of the function.
We could also try and resolve the function - the same way the analyzer does:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

Lines 1405 to 1406 in 8de918b

FunctionDefinition def = functionRegistry.resolveFunction(functionName);

f = uf.buildResolved(configuration, def);

this would still not be ideal though - I am okay to leave this as it is - but we should reopen the discussion of making the analyzer support async rules.

kderusso · 2025-09-19T18:09:40Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

        public LogicalPlan apply(LogicalPlan plan, AnalyzerContext context) {
-            return plan.transformDown(InferencePlan.class, p -> resolveInferencePlan(p, context));
+            return plan.transformDown(InferencePlan.class, p -> resolveInferencePlan(p, context))
+                .transformExpressionsOnly(InferenceFunction.class, f -> resolveInferenceFunction(f, context));


Non-blocking question for my ES|QL education: Can you help me understand why this change?

plan.transformDown(InferencePlan.class, p -> resolveInferencePlan(p, context))

will transform all the children in the plan that have the class InferencePlan with the result of the resolveInferencePlan(p, context) call. Inference plans are typically command using inference: RERANK and COMPLETION

.transformExpressionsOnly(InferenceFunction.class, f -> resolveInferenceFunction(f, context));

will transform do the same but for InferenceFunction instead of plan. Because text embedding is our first inference function this was not yet done, so I added it.

ioanatia

I think this looks really close - I would like us to look a bit more into the serialization aspect before merging

ioanatia · 2025-09-22T10:31:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceResolver.java

+            if (snapshotRegistry.functionExists(functionName)) {
+                FunctionDefinition def = snapshotRegistry.resolveFunction(functionName);
+                if (InferenceFunction.class.isAssignableFrom(def.clazz())) {
+                    String inferenceId = inferenceId(f, def);


To me, the fact that we are looking for the argument that is called inference_id leaks an internal implementation detail of the function.
We could also try and resolve the function - the same way the analyzer does:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

Lines 1405 to 1406 in 8de918b

FunctionDefinition def = functionRegistry.resolveFunction(functionName);

f = uf.buildResolved(configuration, def);

this would still not be ideal though - I am okay to leave this as it is - but we should reopen the discussion of making the analyzer support async rules.

ioanatia · 2025-09-22T10:43:01Z

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/inference/TextEmbedding.java

+    }
+
+    @Override
+    public void writeTo(StreamOutput out) throws IOException {


The first reason is because the Function are supposed to implement it.

That does not mean it should. If the text_embedding function is actually serialized and sent between nodes, that's an execution path that should never happen.

Anything that extends from LogicalPlan is also supposed to implement serialization, but as I said before there are many examples of plans where we don't serialize.
We could have made the same argument for logical plans that the serialization is simple to implement, or that it's not beneficial to let it unimplemented.
However raising an exception in the case where we attempt to serialize these plans is intentional and it captures very clearly the fact that this should never happen.

I get that making this unserializable is more work, especially for AbstractFunctionTestCase, but at least then the behaviour would be correct.

…esql_text_embedding_function_definition

afoucret added 4 commits September 19, 2025 09:06

Adding the TEXT_EMBEDDING_FUNCTION capability.

987a709

Add InferenceFunction and TextEmbedding classes for TEXT_EMBEDDING fu…

516a0b6

…nction

Adding tests for the TextEmbedding function.

36df7cf

Update ESQL usage tests

918bdb7

afoucret requested a review from carlosdelest September 19, 2025 07:36

afoucret added >non-issue v9.2.0 :Search Relevance/ES|QL Search functionality in ES|QL labels Sep 19, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 19, 2025

afoucret requested review from a team and ioanatia September 19, 2025 07:40

afoucret added 4 commits September 19, 2025 10:21

Add text_embedding to the EsqlFunctionRegistry

4418a32

Add text_embedding tests generated doc

4bb147d

InferenceResolver can now resolve inference ids used in a logical pla…

ddf3db5

…n fron inference functions.

Analyzer now resolve inference endpoints for inference function.

aadf880

[CI] Auto commit changes from spotless

6fb48b0

carlosdelest approved these changes Sep 19, 2025

View reviewed changes

afoucret and others added 6 commits September 19, 2025 14:14

Apply suggestions from code review

2c423fb

Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>

Apply suggestion from review.

9406d37

TextEmbedding accepts only keyword parameters.

774986f

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

39b4323

…esql_text_embedding_function_definition

Update TextEmbeddingTests supported parameters data types.

8d4a832

Fix text embedding type validation.

71bae69

ioanatia reviewed Sep 19, 2025

View reviewed changes

kderusso reviewed Sep 19, 2025

View reviewed changes

afoucret added 2 commits September 19, 2025 21:08

Add a dummy example (waiting for real CSV tests to be implemented)

1275582

Add a dummy example (waiting for real CSV tests to be implemented)

89dfcec

afoucret added 4 commits September 19, 2025 21:10

Fix breaking release tests.

39919fa

Made the code more readable.

b7e821d

Filter out TEXT_EMBEDDING FROM CSV TESTS

97ecf80

Fixing CSV tests

d5cf81c

ioanatia reviewed Sep 22, 2025

View reviewed changes

afoucret and others added 4 commits September 24, 2025 10:12

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

778d5e7

…esql_text_embedding_function_definition

Fix typo

0d01b5e

Make TextEmbedding not serializable.

e8ca515

[CI] Auto commit changes from spotless

5b56232

afoucret enabled auto-merge (squash) September 24, 2025 09:36

afoucret and others added 4 commits September 24, 2025 11:48

Remove failing tests as it is failing.

128688b

Remove start import

a8b99c4

Remove test

a0cc965

Merge branch 'main' into esql_text_embedding_function_definition

ca85b86

afoucret merged commit 59c3601 into elastic:main Sep 24, 2025
33 of 34 checks passed

afoucret mentioned this pull request Oct 8, 2025

ES|QL: Add TEXT_EMBEDDING function #131022

Closed

6 tasks

	FunctionDefinition def = functionRegistry.resolveFunction(functionName);
	f = uf.buildResolved(configuration, def);

Conversation

afoucret commented Sep 19, 2025

Examples

Uh oh!

elasticsearchmachine commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Sep 19, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afoucret Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Sep 19, 2025 •

edited

Loading

afoucret Sep 19, 2025 •

edited

Loading