semantic_text: Add exists query#110027
Conversation
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
| final String fieldName = "semantic"; | ||
| final String inferenceId = "test_service"; | ||
|
|
||
| MapperService mapperService = createMapperService(mapping(b -> {})); |
There was a problem hiding this comment.
I extracted this as a method to enable updating the mapping to check the actual exists queries created
| assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(FieldExistsQuery.class)); | ||
| } | ||
|
|
||
| private MapperService mapperServiceForFieldWithModelSettings( |
There was a problem hiding this comment.
... and this is the extracted method
|
Pinging @elastic/es-search (Team:Search) |
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nce/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Returns the primitive Lucene query for a nested query given the primitive query to wrap | ||
| * @param innerQuery query to wraqp in a nested query |
There was a problem hiding this comment.
| * @param innerQuery query to wraqp in a nested query | |
| * @param innerQuery query to wrap in a nested query |
server/src/main/java/org/elasticsearch/index/query/NestedQueryBuilder.java
Outdated
Show resolved
Hide resolved
benwtrent
left a comment
There was a problem hiding this comment.
I am not sure the doToQuery and toQuery logic for the rewrite is 100% correct.
| @Override | ||
| protected Query doToQuery(SearchExecutionContext context) throws IOException { | ||
| Query innerQuery; | ||
| NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path); |
There was a problem hiding this comment.
I think all checks that were previously done in toQuery before this nested lookup need to occur.
Including
NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path);
if (mapper == null) {
if (ignoreUnmapped) {
return new MatchNoDocsQuery();
} else {
throw new IllegalStateException("[" + NAME + "] failed to find nested object under path [" + path + "]");
}
}
And
if (context.allowExpensiveQueries() == false) {
There was a problem hiding this comment.
Totally - I moved the meat out of the sandwich. I have corrected it now.
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Show resolved
Hide resolved
| * @return the primitive Lucene query | ||
| */ | ||
| public static <E extends Exception> Query toQuery( | ||
| CheckedFunction<SearchExecutionContext, Query, E> queryProvider, |
There was a problem hiding this comment.
Great CheckedFunction usage!
| Mapper mapper = mapperService.mappingLookup().getMapper(fieldName); | ||
| assertNotNull(mapper); | ||
| Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService)); | ||
| assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class)); |
There was a problem hiding this comment.
Can you add back the inner query check after this?
There was a problem hiding this comment.
I was thinking that's too much coupling and details - it's not about checking the bool query with two clauses that are resulting from this, but the actual need to do so given we have YAML coverage. We're functionally covered, and we probably shouldn't care about the actual details of the query rewriting.
Happy to add the checks if you think it's absolutely needed
There was a problem hiding this comment.
I agree that hard-coding the expected query class creates too much coupling. However, as written testExistsQuerySparseVector & testExistsQueryDenseVector are not meaningfully different. In particular, the lack of assertions on the child query means that we're not testing that the child query is reflective of the backing embedding field type (sparse_vector or dense_vector).
I thought about how to resolve this without creating unmaintainable couplings and came up with this approach:
MapperBuilderContext mapperBuilderContext = MapperBuilderContext.root(false, false);
SearchExecutionContext searchExecutionContext = createSearchExecutionContext(mapperService);
SparseVectorFieldMapper sparseVectorFieldMapper = new SparseVectorFieldMapper.Builder(SemanticTextField.getEmbeddingsFieldName(fieldName)).build(mapperBuilderContext);
...
assertThat(toParentBlockJoinQuery.getChildQuery(), equalTo(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext)));
And something similar for testExistsQueryDenseVector, except creating a DenseVectorFieldMapper. WDYT?
There was a problem hiding this comment.
I like that, I think it's a good middle ground!
| Mapper mapper = mapperService.mappingLookup().getMapper(fieldName); | ||
| assertNotNull(mapper); | ||
| Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService)); | ||
| assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class)); |
There was a problem hiding this comment.
Can you add back the inner query check after this?
…-query # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
|
|
||
| Mapper mapper = mapperService.mappingLookup().getMapper(fieldName); | ||
| assertNotNull(mapper); | ||
| Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService)); | ||
| assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class)); | ||
| } | ||
|
|
||
| public void testExistsQueryDenseVector() throws IOException { |
There was a problem hiding this comment.
I see what you mean. WDYT about something like this?
| Query expectedQuery = NestedQueryBuilder.toQuery( | |
| context -> sparseVectorFieldMapper.fieldType().existsQuery(context), | |
| SemanticTextField.getChunksFieldName(fieldName), | |
| toParentBlockJoinQuery.getScoreMode(), | |
| false, | |
| searchExecutionContext | |
| ); | |
| assertThat(toParentBlockJoinQuery, equalTo(expectedQuery)); | |
| assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(BooleanQuery.class)); | |
| BooleanQuery booleanQuery = (BooleanQuery) toParentBlockJoinQuery.getChildQuery(); | |
| assertTrue( | |
| booleanQuery.clauses() | |
| .stream() | |
| .anyMatch(b -> b.getQuery().equals(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext))) | |
| ); |
We still need to know that we're creating a boolean query under the hood (which isn't entirely bad IMO), but we're duplicating less.
There was a problem hiding this comment.
I still find this very leaky. Given that we're already testing the query with the YAML tests, I'd opt for the simplest solution here, and just asserting the top ESToParentBlockJoinQuery.
There was a problem hiding this comment.
Fair enough. Thanks for iterating on this with me!
There was a problem hiding this comment.
It was a good idea Mike, thanks for sharing!
Closes elastic/obs-ai-team#162 Closes #192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
…ic#186499) Closes elastic/obs-ai-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298 (cherry picked from commit 671ff30) # Conflicts: # x-pack/plugins/translations/translations/zh-CN.json
…ic#186499) Closes elastic/obs-ai-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
…ic#186499) Closes elastic/obs-ai-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
Does what it says on the tin