[Inference API] Parse endpoint metadata from persisted endpoints by dimitris-athanasiou · Pull Request #143081 · elastic/elasticsearch

dimitris-athanasiou · 2026-02-25T16:27:52Z

This PR continues from #141393. It wires in parsing the endpoint metadata from the persisted endpoints. This means endpoint metadata will now be available from the GET inference APIs.

As we had to make UnparsedModel the way to parse persisted configs for EIS, I took the opportunity to refactor the InferenceService.parsePersistedConfig methods so that there is now only one, the one expecting an UnparsedModel.

This is a nice clean up in that area. Even after this PR there is quite some duplication across services for their parse request/persistent config code. I'll follow up to clean further in separate PRs in the future.

This PR continues from elastic#141393. It wires in parsing the endpoint metadata from the persisted endpoints. This means endpoint metadata will now be available from the GET inference APIs. As we had to make `UnparsedModel` the way to parse persisted configs for EIS, I took the opportunity to refactor the `InferenceService.parsePersistedConfig` methods so that there is now only one, the one expecting an `UnparsedModel`. This is a nice clean up in that area. Even after this PR there is quite some duplication across services for their parse request/persistent config code. I'll follow up to clean further in separate PRs in the future.

elasticsearchmachine · 2026-02-25T16:28:18Z

Pinging @elastic/search-inference-team (Team:Search - Inference)

elasticsearchmachine · 2026-02-25T16:28:19Z

Hi @dimitris-athanasiou, I've created a changelog YAML for you.

DonalEvans · 2026-02-25T23:22:37Z

I wonder if it would be possible to move the implementation of parsePersistedConfig() to SenderService and add an abstract createModel() method that gets called by parsePersistedConfig() and takes the various parsed values and returns the model. We'd have to attempt to parse all of the things that are used by various services, but it should be fine to pass null for things like task settings and chunking settings for those services that don't use them, and for the oddballs like SageMakerService which don't follow the common pattern, we can override the parsePersistedConfig() and keep the current implementation.

EDIT: In fact, I think this change would work very well with the changes that Jan has been making related to the update operation, with the addition of the SenderService.retrieveModelCreatorFromMapOrThrow() method and use of ModelCreator.createFromMaps() to provide a consistent way of going from maps to a model.

Changing the constructor of SenderService to take a map of model creators from the extending service class allows us to implement the method in SenderService like this:

    public Model parsePersistedConfig(UnparsedModel unparsedModel) {
        var config = unparsedModel.settings();
        var secrets = unparsedModel.secrets();
        var taskType = unparsedModel.taskType();

        Map<String, Object> serviceSettingsMap = removeFromMapOrThrowIfNull(config, ModelConfigurations.SERVICE_SETTINGS);
        Map<String, Object> taskSettingsMap = removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);
        Map<String, Object> secretSettingsMap = secrets == null ? null : removeFromMapOrDefaultEmpty(secrets, ModelSecrets.SECRET_SETTINGS);

        ChunkingSettings chunkingSettings = null;
        if (TaskType.TEXT_EMBEDDING.equals(taskType) || TaskType.EMBEDDING.equals(taskType) || TaskType.SPARSE_EMBEDDING.equals(taskType)) {
            chunkingSettings = ChunkingSettingsBuilder.fromMap(removeFromMap(config, ModelConfigurations.CHUNKING_SETTINGS));
        }

        return retrieveModelCreatorFromMapOrThrow(
            modelCreators,
            unparsedModel.inferenceEntityId(),
            taskType,
            name(),
            ConfigurationParseContext.PERSISTENT
        ).createFromMaps(
            unparsedModel.inferenceEntityId(),
            taskType,
            name(),
            serviceSettingsMap,
            taskSettingsMap,
            chunkingSettings,
            secretSettingsMap,
            ConfigurationParseContext.PERSISTENT
        );
    }

I tested this with CohereService and Ai21Service and the tests all passed, so I think it's a viable approach.

dimitris-athanasiou · 2026-02-26T11:22:29Z

This is brilliant @DonalEvans ! I followed through and the result is so beautiful it almost made me cry! :-)

dimitris-athanasiou · 2026-02-26T11:23:42Z

One thing I noticed: HuggingFaceElserService was previously not parsing chunking settings even though it's just for SPARSE_EMBEDDING task which should support it. With the refactoring, it now does. Is that ok or am I missing something?

jonathan-buttner

Looks good, just left a few questions.

jonathan-buttner · 2026-02-26T15:45:15Z

.../src/test/java/org/elasticsearch/xpack/inference/services/deepseek/DeepSeekServiceTests.java

-                  "secret_settings": {
-                    "api_key": "12345"
-                  }
+                  "service_settings": {}


Just curious why we're removing the secret_settings here? Is it because they're not necessary for the test to to find model?

The test's parsePersistedConfig takes a json string so it requires some handling to convert to an UnparsedModel which expects separate maps for service/secret settings. However, I have now modified it to work properly and reinstated those 2 tests.

jonathan-buttner · 2026-02-26T15:46:40Z

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java


    public void testParseStoredConfig_DoesNotThrowWhenAnExtraKeyExistsInServiceSettings() throws IOException {
        try (var service = createServiceWithMockSender()) {
-            {


Should we keep the individual cases in the test?

Not sure what you mean. The test was previously trying the same thing 3 times because there were 3 different parse methods. The actual config used was always the same.

jonathan-buttner · 2026-02-26T15:51:18Z

server/src/main/java/org/elasticsearch/inference/UnparsedModel.java

+        this.settings = settings == null ? null : new HashMap<>(settings);
+        // Additionally, an empty secrets map is treated as null in order to skip potential validations for missing keys
+        // which should not be necessary when parsing a persisted model.
+        this.secrets = secrets == null || secrets.isEmpty() ? null : new HashMap<>(secrets);


Can you talk a little more about the advantage of using null here? It was possible to get null before right? We're just changing empty to also be null?

If secrets here ends being an empty map, validations will fail complaining for missing fields. In my understanding, we should never have validations failing when parsing a persisted config. They should only apply when we parse from the request. Thus, here, I'm taking care of this potential issue in a single place.

Previously this was not an issue because there was the variant for parsing without secrets which resulted in null. As now we have a single parse method, it is good defense I think to handle this here. Happy to revise though if you think otherwise.

jonathan-buttner · 2026-02-26T15:58:11Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/SenderService.java

    private final Sender sender;
    private final ServiceComponents serviceComponents;
    private final ClusterService clusterService;
+    private final Map<TaskType, ModelCreator<? extends M>> modelCreators;


Lifting this up is great because it reduces the code duplication. My only concern is when the child classes don't adhere to the Map<TaskType, ModelCreator<? extends M>> like EIS. Any thoughts on how we can handle that in the future?

I guess that's the same situation we were in before, if we need to add a new field to the ModelCreator interface then we have to add it everywhere.

I'm not really sure how to solve that, maybe by using a new object that gets passed instead (similar to what we did for UnparsedModel) 🤷‍♂️

I think ideally we would have those EIS ModelCreators share a common base class/interface with the rest. Then we could deal with this. I thought about taking a look, but I wanted to draw the line somewhere on this PR :-) But it should be possible I'd think.

DonalEvans

I think we talked about this already, but it would be good if we could reduce the duplication of tests for the parsePersistedConfig() method, especially now that a lot of tests that were testing the similar parsePersistedConfigWithSecrets() method have been converted into testing parsePersistedConfig(). I haven't checked all of the test files, but certainly in HuggingFaceServiceTests (and I suspect many others) we now have multiple tests that are testing the same behaviour.

Not something that absolutely needs to be addressed in this PR, but if we don't already have an issue for it, it would be good to create one.

.../main/java/org/elasticsearch/xpack/inference/action/TransportUpdateInferenceModelAction.java

dimitris-athanasiou · 2026-02-27T17:09:52Z

@DonalEvans Agreed on removing test duplication. I raised https://github.com/elastic/search-team/issues/13133

…cations * upstream/main: (60 commits) Use batches for other bulk vector benchmarks (elastic#143167) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndexAfterStats} elastic#143388 Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testBackToBackQueuedDeletes elastic#143387 [Inference API] Parse endpoint metadata from persisted endpoints (elastic#143081) Add cluster formation doc to DistributedArchitectureGuide (elastic#143318) Fix flattened root block loader null expectation (elastic#143238) Unmute ValueSourceReaderTypeConversionTests testLoadAll (elastic#143189) ESQL: Add split coalescing for many small files (elastic#143335) Unmute mixed-cluster spatial parse warning test (elastic#143186) Fix zero-size estimate in BytesRefBlock null test (elastic#143258) Make DataType and DataFormat top-level enums (elastic#143312) Add support for steps to change the target index name for later steps (elastic#142955) Set mayContainDuplicates flag to test deduplication (elastic#143375) ESQL: Fix Driver search load millis as nanos bug (elastic#143267) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.LookupJoinWithMixPushableAndUnpushableFilters} elastic#143378 ESQL: Forbid MV_EXPAND before full text functions (elastic#143249) ESQL: Fix unresolved name pattern (elastic#143210) Implement boxplot queryDSL aggregation for exponential_histograms (elastic#143026) Add prefetching to x64 bulk vector implementations (elastic#142387) Make large segment vector tests resilient to memory constraints (elastic#143366) ...

…stic#143081) This PR continues from elastic#141393. It wires in parsing the endpoint metadata from the persisted endpoints. This means endpoint metadata will now be available from the GET inference APIs. As we had to make `UnparsedModel` the way to parse persisted configs for EIS, I took the opportunity to refactor the `InferenceService.parsePersistedConfig` methods so that there is now only one, the one expecting an `UnparsedModel`. Also, I'm removing the duplication by pulling this method up in `SenderService`.

dimitris-athanasiou added >enhancement :SearchOrg/Inference Label for the Search Inference team v9.4.0 labels Feb 25, 2026

dimitris-athanasiou requested a review from jonathan-buttner February 25, 2026 16:28

elasticsearchmachine added the Team:Search - Inference label Feb 25, 2026

Update docs/changelog/143081.yaml

977b372

dimitris-athanasiou added 4 commits February 25, 2026 20:48

Ensure UnparsedModel secrets is null when empty

a43569c

Fix ModelRegistryIT

8904690

Merge branch 'main' into parse-inference-endpoint-metadata

d80e4c1

Remove unused import

5272656

dimitris-athanasiou added 3 commits February 26, 2026 11:36

Fix tests

b0d295f

Remove duplication by moving parsePersistedConfig to SenderService

f57c954

Merge branch 'main' into parse-inference-endpoint-metadata

9ee93eb

jonathan-buttner reviewed Feb 26, 2026

View reviewed changes

dimitris-athanasiou added 2 commits February 27, 2026 13:46

reinstate modified DeepSeekServiceTests

ef1b0ea

Merge branch 'main' into parse-inference-endpoint-metadata

ff50ab7

DonalEvans approved these changes Feb 27, 2026

View reviewed changes

.../main/java/org/elasticsearch/xpack/inference/action/TransportUpdateInferenceModelAction.java Outdated Show resolved Hide resolved

Use UnparsedModel without copy in update action

4bd7d27

jonathan-buttner approved these changes Mar 2, 2026

View reviewed changes

dimitris-athanasiou merged commit a59b309 into elastic:main Mar 2, 2026
35 checks passed

dimitris-athanasiou deleted the parse-inference-endpoint-metadata branch March 2, 2026 16:01

Conversation

dimitris-athanasiou commented Feb 25, 2026

Uh oh!

elasticsearchmachine commented Feb 25, 2026

Uh oh!

elasticsearchmachine commented Feb 25, 2026

Uh oh!

DonalEvans commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimitris-athanasiou commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimitris-athanasiou commented Feb 26, 2026

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dimitris-athanasiou commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DonalEvans commented Feb 25, 2026 •

edited

Loading

dimitris-athanasiou commented Feb 26, 2026 •

edited

Loading