Skip to content

[Inference API] Parse endpoint metadata from persisted endpoints#143081

Merged
dimitris-athanasiou merged 12 commits intoelastic:mainfrom
dimitris-athanasiou:parse-inference-endpoint-metadata
Mar 2, 2026
Merged

[Inference API] Parse endpoint metadata from persisted endpoints#143081
dimitris-athanasiou merged 12 commits intoelastic:mainfrom
dimitris-athanasiou:parse-inference-endpoint-metadata

Conversation

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor

This PR continues from #141393. It wires in parsing the endpoint metadata from the persisted endpoints. This means endpoint metadata will now be available from the GET inference APIs.

As we had to make UnparsedModel the way to parse persisted configs for EIS, I took the opportunity to refactor the InferenceService.parsePersistedConfig methods so that there is now only one, the one expecting an UnparsedModel.

This is a nice clean up in that area. Even after this PR there is quite some duplication across services for their parse request/persistent config code. I'll follow up to clean further in separate PRs in the future.

This PR continues from elastic#141393. It wires in parsing the endpoint
metadata from the persisted endpoints. This means endpoint metadata
will now be available from the GET inference APIs.

As we had to make `UnparsedModel` the way to parse persisted configs
for EIS, I took the opportunity to refactor the `InferenceService.parsePersistedConfig`
methods so that there is now only one, the one expecting an `UnparsedModel`.

This is a nice clean up in that area. Even after this PR there is quite some duplication
across services for their parse request/persistent config code. I'll follow up to clean
further in separate PRs in the future.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/search-inference-team (Team:Search - Inference)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @dimitris-athanasiou, I've created a changelog YAML for you.

@DonalEvans
Copy link
Copy Markdown
Contributor

DonalEvans commented Feb 25, 2026

I wonder if it would be possible to move the implementation of parsePersistedConfig() to SenderService and add an abstract createModel() method that gets called by parsePersistedConfig() and takes the various parsed values and returns the model. We'd have to attempt to parse all of the things that are used by various services, but it should be fine to pass null for things like task settings and chunking settings for those services that don't use them, and for the oddballs like SageMakerService which don't follow the common pattern, we can override the parsePersistedConfig() and keep the current implementation.

EDIT: In fact, I think this change would work very well with the changes that Jan has been making related to the update operation, with the addition of the SenderService.retrieveModelCreatorFromMapOrThrow() method and use of ModelCreator.createFromMaps() to provide a consistent way of going from maps to a model.

Changing the constructor of SenderService to take a map of model creators from the extending service class allows us to implement the method in SenderService like this:

    public Model parsePersistedConfig(UnparsedModel unparsedModel) {
        var config = unparsedModel.settings();
        var secrets = unparsedModel.secrets();
        var taskType = unparsedModel.taskType();

        Map<String, Object> serviceSettingsMap = removeFromMapOrThrowIfNull(config, ModelConfigurations.SERVICE_SETTINGS);
        Map<String, Object> taskSettingsMap = removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);
        Map<String, Object> secretSettingsMap = secrets == null ? null : removeFromMapOrDefaultEmpty(secrets, ModelSecrets.SECRET_SETTINGS);

        ChunkingSettings chunkingSettings = null;
        if (TaskType.TEXT_EMBEDDING.equals(taskType) || TaskType.EMBEDDING.equals(taskType) || TaskType.SPARSE_EMBEDDING.equals(taskType)) {
            chunkingSettings = ChunkingSettingsBuilder.fromMap(removeFromMap(config, ModelConfigurations.CHUNKING_SETTINGS));
        }

        return retrieveModelCreatorFromMapOrThrow(
            modelCreators,
            unparsedModel.inferenceEntityId(),
            taskType,
            name(),
            ConfigurationParseContext.PERSISTENT
        ).createFromMaps(
            unparsedModel.inferenceEntityId(),
            taskType,
            name(),
            serviceSettingsMap,
            taskSettingsMap,
            chunkingSettings,
            secretSettingsMap,
            ConfigurationParseContext.PERSISTENT
        );
    }

I tested this with CohereService and Ai21Service and the tests all passed, so I think it's a viable approach.

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor Author

dimitris-athanasiou commented Feb 26, 2026

This is brilliant @DonalEvans ! I followed through and the result is so beautiful it almost made me cry! :-)

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor Author

One thing I noticed: HuggingFaceElserService was previously not parsing chunking settings even though it's just for SPARSE_EMBEDDING task which should support it. With the refactoring, it now does. Is that ok or am I missing something?

Copy link
Copy Markdown
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just left a few questions.

"secret_settings": {
"api_key": "12345"
}
"service_settings": {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we're removing the secret_settings here? Is it because they're not necessary for the test to to find model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test's parsePersistedConfig takes a json string so it requires some handling to convert to an UnparsedModel which expects separate maps for service/secret settings. However, I have now modified it to work properly and reinstated those 2 tests.


public void testParseStoredConfig_DoesNotThrowWhenAnExtraKeyExistsInServiceSettings() throws IOException {
try (var service = createServiceWithMockSender()) {
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the individual cases in the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The test was previously trying the same thing 3 times because there were 3 different parse methods. The actual config used was always the same.

this.settings = settings == null ? null : new HashMap<>(settings);
// Additionally, an empty secrets map is treated as null in order to skip potential validations for missing keys
// which should not be necessary when parsing a persisted model.
this.secrets = secrets == null || secrets.isEmpty() ? null : new HashMap<>(secrets);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk a little more about the advantage of using null here? It was possible to get null before right? We're just changing empty to also be null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If secrets here ends being an empty map, validations will fail complaining for missing fields. In my understanding, we should never have validations failing when parsing a persisted config. They should only apply when we parse from the request. Thus, here, I'm taking care of this potential issue in a single place.

Previously this was not an issue because there was the variant for parsing without secrets which resulted in null. As now we have a single parse method, it is good defense I think to handle this here. Happy to revise though if you think otherwise.

private final Sender sender;
private final ServiceComponents serviceComponents;
private final ClusterService clusterService;
private final Map<TaskType, ModelCreator<? extends M>> modelCreators;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifting this up is great because it reduces the code duplication. My only concern is when the child classes don't adhere to the Map<TaskType, ModelCreator<? extends M>> like EIS. Any thoughts on how we can handle that in the future?

I guess that's the same situation we were in before, if we need to add a new field to the ModelCreator interface then we have to add it everywhere.

I'm not really sure how to solve that, maybe by using a new object that gets passed instead (similar to what we did for UnparsedModel) 🤷‍♂️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we would have those EIS ModelCreators share a common base class/interface with the rest. Then we could deal with this. I thought about taking a look, but I wanted to draw the line somewhere on this PR :-) But it should be possible I'd think.

Copy link
Copy Markdown
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about this already, but it would be good if we could reduce the duplication of tests for the parsePersistedConfig() method, especially now that a lot of tests that were testing the similar parsePersistedConfigWithSecrets() method have been converted into testing parsePersistedConfig(). I haven't checked all of the test files, but certainly in HuggingFaceServiceTests (and I suspect many others) we now have multiple tests that are testing the same behaviour.

Not something that absolutely needs to be addressed in this PR, but if we don't already have an issue for it, it would be good to create one.

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor Author

@DonalEvans Agreed on removing test duplication. I raised https://github.com/elastic/search-team/issues/13133

@dimitris-athanasiou dimitris-athanasiou merged commit a59b309 into elastic:main Mar 2, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 2, 2026
…cations

* upstream/main: (60 commits)
  Use batches for other bulk vector benchmarks (elastic#143167)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndexAfterStats} elastic#143388
  Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testBackToBackQueuedDeletes elastic#143387
  [Inference API] Parse endpoint metadata from persisted endpoints (elastic#143081)
  Add cluster formation doc to DistributedArchitectureGuide (elastic#143318)
  Fix flattened root block loader null expectation (elastic#143238)
  Unmute ValueSourceReaderTypeConversionTests testLoadAll (elastic#143189)
  ESQL: Add split coalescing for many small files (elastic#143335)
  Unmute mixed-cluster spatial parse warning test (elastic#143186)
  Fix zero-size estimate in BytesRefBlock null test (elastic#143258)
  Make DataType and DataFormat top-level enums (elastic#143312)
  Add support for steps to change the target index name for later steps (elastic#142955)
  Set mayContainDuplicates flag to test deduplication (elastic#143375)
  ESQL: Fix Driver search load millis as nanos bug (elastic#143267)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.LookupJoinWithMixPushableAndUnpushableFilters} elastic#143378
  ESQL: Forbid MV_EXPAND before full text functions (elastic#143249)
  ESQL: Fix unresolved name pattern (elastic#143210)
  Implement boxplot queryDSL aggregation for exponential_histograms (elastic#143026)
  Add prefetching to x64 bulk vector implementations (elastic#142387)
  Make large segment vector tests resilient to memory constraints (elastic#143366)
  ...
@dimitris-athanasiou dimitris-athanasiou deleted the parse-inference-endpoint-metadata branch March 2, 2026 16:01
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
…stic#143081)

This PR continues from elastic#141393. It wires in parsing the endpoint
metadata from the persisted endpoints. This means endpoint metadata
will now be available from the GET inference APIs.

As we had to make `UnparsedModel` the way to parse persisted configs
for EIS, I took the opportunity to refactor the `InferenceService.parsePersistedConfig`
methods so that there is now only one, the one expecting an `UnparsedModel`. Also, 
I'm removing the duplication by pulling this method up in `SenderService`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants