Bring back sparse_vector mapping#98996
Conversation
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This is copied and pasted from RankFeaturesFieldMapper with some checks on index version.
| public void parse(DocumentParserContext context) throws IOException { | ||
|
|
||
| // No support for indexing / searching 7.x sparse_vector field types | ||
| if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_0_0)) { |
There was a problem hiding this comment.
Same here - checking that we're not operating on indices created before 8.0. Return the same errors that the previous implementation had.
There was a problem hiding this comment.
Copied from RankFeaturesFieldMapperTests, with some additions
| return Strings.concatStringArrays(super.getParseMinimalWarnings(indexVersion), additionalWarnings); | ||
| } | ||
|
|
||
| public void testSparseVectorWith7xIndex() throws Exception { |
There was a problem hiding this comment.
Added from the previous version of the test. The main idea is to allow indices to exist with the previous sparse_vector implementation (<8.0), but no new docs can be added to it.
| return Strings.EMPTY_ARRAY; | ||
| } | ||
|
|
||
| protected String[] getParseMinimalWarnings(IndexVersion indexVersion) { |
There was a problem hiding this comment.
Added this method so we can specify warnings that will be included when using mappings that go back to specific index versions.
| @Override | ||
| protected Parameter<?>[] getParameters() { | ||
| return new Parameter<?>[] { meta }; | ||
| return new Parameter<?>[] { positiveScoreImpact, meta }; |
There was a problem hiding this comment.
remove positiveScoreImpact we can simplify this feature.
| @@ -0,0 +1,21 @@ | |||
| setup: | |||
There was a problem hiding this comment.
I think we should have a simple set of tests for exercising the mapped type & queries and such. No need to have a single file only for mapping tests.
This file should test:
- Creating the mapping (in the correct versions)
- Ensure creation fails in older 8.x versions
- That queries work
- That indexing works
| @@ -0,0 +1,143 @@ | |||
| --- | |||
| "Indexing and searching sparse vectors": | |||
There was a problem hiding this comment.
Added a test that checks indexing and performs a simulated text_expansion query.
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| public void testBoostNotAllowed() throws IOException { |
There was a problem hiding this comment.
I've separated boost checking in two, for deprecated warnings and boosts not allowed.
| assertParseMinimalWarnings(); | ||
| } | ||
|
|
||
| protected IndexVersion boostNotAllowedIndexVersion() { |
There was a problem hiding this comment.
There is the problem that checking for boosts not allowed caused sparse_vector mappings to error because sparse_vector is not allowed for 8.0.0.
I've created an overrridable method that returns the index version to test boosting for.
I'm happy to hear about other ideas for solving this in a simpler way 👍
| RAW_MODEL_SIZE = Base64.getDecoder().decode(BASE_64_ENCODED_MODEL).length; | ||
| } | ||
|
|
||
| public void testRankFeaturesTextExpansionQuery() throws IOException { |
There was a problem hiding this comment.
I've duped the tests on text_expansion queries so we check they can be used both with rank_features and sparse_vector field types.
I could as well have created separate classes for this, LMK if you can think of better ways to handle this.
There was a problem hiding this comment.
Lets not do this. Thank you for taking initiative here. But, all the ML related changes should be done after sparse_vector is merged. It will be good to separate the changes to make the change log cleaner and reviews easier.
There was a problem hiding this comment.
Got it - I'll revert this change and the text_expansion_search_sparse_vector.yml test as well.
| @@ -0,0 +1,109 @@ | |||
| # This test uses the simple model defined in | |||
There was a problem hiding this comment.
Added a separate YAML test that performs the same checks for text_expansion queries for sparse_vector field type
|
@elasticsearchmachine run elasticsearch-ci/part-1 |
|
Hi @carlosdelest, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search (Team:Search) |
| reason: "sparse_vector field type supported in 7.x" | ||
| - do: | ||
| allowed_warnings: | ||
| - "The [sparse_vector] field type is deprecated and will be removed in 8.0." |
There was a problem hiding this comment.
Do we need to keep the 8.0 warning here, since the test is 7.x only?
There was a problem hiding this comment.
8.0.0 - indicates to me that this test only runs in 7.x as well. I am not sure why both warnings would be returned.
There was a problem hiding this comment.
I have received both warnings on BWC tests, depending on the BWC test being performed:
- All nodes under test are on 7.x
- A mixed cluster test like this one. I understand that a node could have been updated to 8.x as part of this test, and thus return this warning.
|
|
||
| public static final String CONTENT_TYPE = "sparse_vector"; | ||
|
|
||
| static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to " |
There was a problem hiding this comment.
Nitpicky question: Should we use the CONTENT_TYPE constant in error messages instead of hardcoding sparse_vector?
There was a problem hiding this comment.
I copied / pasted from the previous sparse_vector error messages. As content type constant is unlikely to change, I personally think this is more readable as well (and in line with how error messages are created as well)
| public DocValueFormat docValueFormat(String format, ZoneId timeZone) { | ||
| throw new UnsupportedOperationException(ERROR_MESSAGE_7X); | ||
| public Query existsQuery(SearchExecutionContext context) { | ||
| throw new IllegalArgumentException("[sparse_vector] fields do not support [exists] queries"); |
There was a problem hiding this comment.
Why did we decide to move from UnsupportedOperationException to IllegalArgumentException? Both are fine, just curious.
There was a problem hiding this comment.
It's not really a change in the exception, as the method is different. Keep in mind that I copied / pasted RankFeaturesFieldMapper, so there are a lot of changes and the diff is misleading.
| ); | ||
| @Override | ||
| protected boolean allowsNullValues() { | ||
| return false; // TODO should this allow null values? |
There was a problem hiding this comment.
Nitpick: Remove TODO before merging
There was a problem hiding this comment.
Copied / pasted from RankFeaturesFieldMapper, so I think it makes sense for re-evaluating in the future.
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
|
@elasticsearchmachine update branch |
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
| static final String ERROR_MESSAGE_8X = "The [sparse_vector] field type is not supported from 8.0 to 8.10 versions."; | ||
| static final IndexVersion PREVIOUS_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_0_0; | ||
|
|
||
| static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000; |
There was a problem hiding this comment.
Should we have here V_8_500_001?
| // No support for indexing / searching 7.x sparse_vector field types | ||
| if (context.indexSettings().getIndexVersionCreated().before(PREVIOUS_SPARSE_VECTOR_INDEX_VERSION)) { | ||
| throw new UnsupportedOperationException(ERROR_MESSAGE_7X); | ||
| } else if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_11_0)) { |
There was a problem hiding this comment.
Here instead of 8_11 we should also put a new V_8_500_001 version
| if (context.doc().getByKey(key) != null) { | ||
| throw new IllegalArgumentException( | ||
| "[sparse_vector] fields do not support indexing multiple values for the same " | ||
| + "rank feature [" |
There was a problem hiding this comment.
Instead of "rank feature", I think we can just say "feature"
|
|
||
| static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000; | ||
|
|
||
| private static SparseVectorFieldType ft(FieldMapper in) { |
There was a problem hiding this comment.
It seems that it is never used. Should we remove it?
mayya-sharipova
left a comment
There was a problem hiding this comment.
@carlosdelest Thanks Carlos, I left minor comments, but the change overall LGTM!
…ector_type_comeback
…ector_type_comeback
|
@elasticsearchmachine update branch |
|
@elasticsearchmachine run elasticsearch-ci/full-bwc |
|
This PR has been merged. However, I would like to ask some questions to improve my understanding of this work. 😄
|
Yes
No
No |
Adds back
sparse_vectorfield type, as a copy ofrank_features.The main goal is to have the
sparse_vectorfield type available so we can switch ELSER queries to use the new type.