Semantic_text multi-field/copy_to compatibility POC#105080
Semantic_text multi-field/copy_to compatibility POC#105080Mikep86 wants to merge 13 commits intoelastic:feature/semantic-textfrom
Conversation
… for semantic_text
| */ | ||
| public interface InferenceModelFieldType { | ||
| // TODO: Are there any scenarios where extending SimpleMappedFieldType becomes an issue? | ||
| public abstract class InferenceModelFieldType extends SimpleMappedFieldType { |
There was a problem hiding this comment.
I changed InferenceModelFieldType because as defined before, you could not call basic MappedFieldType methods like name() on it, which made it pretty inconvenient to use.
This change is not strictly required, but IMO makes for a cleaner implementation overall. Does anyone see any issues with extending SimpleMappedFieldType like this?
There was a problem hiding this comment.
Given that we were already basically doing this in MockInferenceFieldType if feels OK, but I will defer to others here.
| return new SemanticTextFieldMapper(name(), new SemanticTextFieldType(name(), modelId.getValue(), meta.getValue()), copyTo); | ||
| return new SemanticTextFieldMapper( | ||
| name, | ||
| new SemanticTextFieldType(context.buildFullName(name), modelId.getValue(), meta.getValue()), |
There was a problem hiding this comment.
This change fixes a bug where semantic_text fields that were not top-level fields (i.e. were part of an object field or declared as a multi-field) did not register with the proper fully-qualified field name.
| emptyList() | ||
| ); | ||
| assertEquals(Set.of("semantic", "field1", "field2"), lookup.sourcePaths("semantic")); | ||
| assertEquals(Map.of("test_model", Map.of("semantic", List.of("semantic", "field1", "field2"))), lookup.getFieldsForModels()); |
There was a problem hiding this comment.
As written, these tests are potentially flaky due to the unknown order of the source field list. I plan to address this once we understand what the final definition of fieldsForModels will be.
| private final Double indexWriteLoadForecast; | ||
| private final Long shardSizeInBytesForecast; | ||
| private final Diff<Map<String, Set<String>>> fieldsForModels; | ||
| private final Diff<Map<String, Map<String, List<String>>>> fieldsForModels; |
There was a problem hiding this comment.
Shouldn't we capture this as a specific record with:
- model ID
- destination field
- List of optional source fields
Having a <Map<String, Map<String, List<String>>> is a bit confusing to me.
No need to change it now, just a thought for accommodating changes to the structure.
There was a problem hiding this comment.
Absolutely, <Map<String, Map<String, List<String>>> is super confusing. It's more of a placeholder for now, I didn't want to put too much work into optimizing the data type knowing that it could change a lot based on the semantic query work.
There was a problem hiding this comment.
😵 Made it a little hard to read, but I think this makes sense 👍
| */ | ||
| public interface InferenceModelFieldType { | ||
| // TODO: Are there any scenarios where extending SimpleMappedFieldType becomes an issue? | ||
| public abstract class InferenceModelFieldType extends SimpleMappedFieldType { |
There was a problem hiding this comment.
Given that we were already basically doing this in MockInferenceFieldType if feels OK, but I will defer to others here.
| private final Double indexWriteLoadForecast; | ||
| private final Long shardSizeInBytesForecast; | ||
| private final Diff<Map<String, Set<String>>> fieldsForModels; | ||
| private final Diff<Map<String, Map<String, List<String>>>> fieldsForModels; |
There was a problem hiding this comment.
😵 Made it a little hard to read, but I think this makes sense 👍
There was a problem hiding this comment.
I modified MockFieldMapper.Builder to allow field types other than FakeFieldType so that we can use it to test MockInferenceModelFieldType as a multi-field target.
Previously, I used MockFieldMapper.Builder#addMultiField(FieldMapper mapper) for this, but this PR restricted the visibility of FieldMapper.MultiFields.Builder#add(FieldMapper mapper), breaking that approach.
POC implementation of multi-field/
copy_tosupport forsemantic_textfields. It relies on modifyingfieldsForModelsto store a one-to-many relationship between the target field (i.e. where inference results should be written to) and source fields (i.e. which fields in source to read text from).The code to primarily focus on in this POC is in
FieldTypeLookupand the tests inFieldTypeLookupTests&SemanticTextFieldMapperTests.Due to changes to
fieldsForModelsinIndexMetadata, many other files had to change as well. To be clear, I am not suggesting that we usefieldsForModelsas it is defined in this POC, this is just the smallest modification I could make to it to represent the one-to-many relationship between the target field and source field(s). The final definition offieldsForModelsis up for discussion and based not only on the work in this POC, but also on how we choose to implement the semantic query.As such, I have commented out any code outside the scope of this POC that is dependent on the old definition of
fieldsForModels. The intent is go back and update this code once we have decided on whatfieldsForModelsshould look like.