Skip to content

Semantic_text multi-field/copy_to compatibility POC#105080

Closed
Mikep86 wants to merge 13 commits intoelastic:feature/semantic-textfrom
Mikep86:semantic-text_multi-field-compatibility-poc
Closed

Semantic_text multi-field/copy_to compatibility POC#105080
Mikep86 wants to merge 13 commits intoelastic:feature/semantic-textfrom
Mikep86:semantic-text_multi-field-compatibility-poc

Conversation

@Mikep86
Copy link
Copy Markdown
Contributor

@Mikep86 Mikep86 commented Feb 2, 2024

POC implementation of multi-field/copy_to support for semantic_text fields. It relies on modifying fieldsForModels to store a one-to-many relationship between the target field (i.e. where inference results should be written to) and source fields (i.e. which fields in source to read text from).

The code to primarily focus on in this POC is in FieldTypeLookup and the tests in FieldTypeLookupTests & SemanticTextFieldMapperTests.

Due to changes to fieldsForModels in IndexMetadata, many other files had to change as well. To be clear, I am not suggesting that we use fieldsForModels as it is defined in this POC, this is just the smallest modification I could make to it to represent the one-to-many relationship between the target field and source field(s). The final definition of fieldsForModels is up for discussion and based not only on the work in this POC, but also on how we choose to implement the semantic query.

As such, I have commented out any code outside the scope of this POC that is dependent on the old definition of fieldsForModels. The intent is go back and update this code once we have decided on what fieldsForModels should look like.

@Mikep86 Mikep86 added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.13.0 labels Feb 2, 2024
*/
public interface InferenceModelFieldType {
// TODO: Are there any scenarios where extending SimpleMappedFieldType becomes an issue?
public abstract class InferenceModelFieldType extends SimpleMappedFieldType {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed InferenceModelFieldType because as defined before, you could not call basic MappedFieldType methods like name() on it, which made it pretty inconvenient to use.

This change is not strictly required, but IMO makes for a cleaner implementation overall. Does anyone see any issues with extending SimpleMappedFieldType like this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we were already basically doing this in MockInferenceFieldType if feels OK, but I will defer to others here.

return new SemanticTextFieldMapper(name(), new SemanticTextFieldType(name(), modelId.getValue(), meta.getValue()), copyTo);
return new SemanticTextFieldMapper(
name,
new SemanticTextFieldType(context.buildFullName(name), modelId.getValue(), meta.getValue()),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a bug where semantic_text fields that were not top-level fields (i.e. were part of an object field or declared as a multi-field) did not register with the proper fully-qualified field name.

emptyList()
);
assertEquals(Set.of("semantic", "field1", "field2"), lookup.sourcePaths("semantic"));
assertEquals(Map.of("test_model", Map.of("semantic", List.of("semantic", "field1", "field2"))), lookup.getFieldsForModels());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, these tests are potentially flaky due to the unknown order of the source field list. I plan to address this once we understand what the final definition of fieldsForModels will be.

@Mikep86 Mikep86 requested review from a team, carlosdelest and jimczi February 2, 2024 18:36
private final Double indexWriteLoadForecast;
private final Long shardSizeInBytesForecast;
private final Diff<Map<String, Set<String>>> fieldsForModels;
private final Diff<Map<String, Map<String, List<String>>>> fieldsForModels;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we capture this as a specific record with:

  • model ID
  • destination field
  • List of optional source fields

Having a <Map<String, Map<String, List<String>>> is a bit confusing to me.

No need to change it now, just a thought for accommodating changes to the structure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, <Map<String, Map<String, List<String>>> is super confusing. It's more of a placeholder for now, I didn't want to put too much work into optimizing the data type knowing that it could change a lot based on the semantic query work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😵 Made it a little hard to read, but I think this makes sense 👍

Comment thread server/src/main/java/org/elasticsearch/index/mapper/FieldTypeLookup.java Outdated
*/
public interface InferenceModelFieldType {
// TODO: Are there any scenarios where extending SimpleMappedFieldType becomes an issue?
public abstract class InferenceModelFieldType extends SimpleMappedFieldType {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we were already basically doing this in MockInferenceFieldType if feels OK, but I will defer to others here.

private final Double indexWriteLoadForecast;
private final Long shardSizeInBytesForecast;
private final Diff<Map<String, Set<String>>> fieldsForModels;
private final Diff<Map<String, Map<String, List<String>>>> fieldsForModels;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😵 Made it a little hard to read, but I think this makes sense 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified MockFieldMapper.Builder to allow field types other than FakeFieldType so that we can use it to test MockInferenceModelFieldType as a multi-field target.

Previously, I used MockFieldMapper.Builder#addMultiField(FieldMapper mapper) for this, but this PR restricted the visibility of FieldMapper.MultiFields.Builder#add(FieldMapper mapper), breaking that approach.

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@Mikep86 Mikep86 closed this Aug 22, 2024
@javanna javanna removed the v8.16.0 label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants