feat(es|ql): add dense_vector support in coalesce#142974
feat(es|ql): add dense_vector support in coalesce#142974mromaios merged 19 commits intoelastic:mainfrom
Conversation
| 4 | null | ||
| ; | ||
|
|
||
| coalesceBfloat16VectorWithFallback |
There was a problem hiding this comment.
❓ 💭 Do we need coalesce tests in the individual .csv-spec files, or is that an overkill? (same comment for the other types (default, bit, byte))
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
…lasticsearch into dense_vector_support_coalesce
…lasticsearch into dense_vector_support_coalesce
…lasticsearch into dense_vector_support_coalesce
|
Hi @mromaios, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| @@ -0,0 +1,6 @@ | |||
| id:l, float_vector:dense_vector, float_vector_2:dense_vector, byte_vector:dense_vector, byte_vector_2:dense_vector, bit_vector:dense_vector, bit_vector_2:dense_vector, bfloat16_vector:dense_vector, bfloat16_vector_2:dense_vector | |||
There was a problem hiding this comment.
Check: https://github.com/elastic/elasticsearch/pull/142974/changes#r2848341088
A question about whether we want coalesce tests in the dense_vector*.csv-spec tests.
carlosdelest
left a comment
There was a problem hiding this comment.
LGTM, great job @mromaios ! 👏
One thing we're not doing is validating that we are always returning dense_vectors of the same dimensions 🤔 . That's not something we can validate at the analyzer level (as we don't have dimension information there), and tricky to do at runtime (we would have to change how the ExpressionEvaluator work).
Let's keep it this way - we could add an Evaluator that validates dimensions as a follow up PR if needed
ioanatia
left a comment
There was a problem hiding this comment.
looks good.
@carlosdelest I am not sure COALESCE needs to check whether the vectors have the same dimensions.
The documentation of COALESCE simply states:
Returns the first of its arguments that is not null. If all arguments are null, it returns null.
I worry we would overcomplicate the definition of COALESCE if we add a validation that is specific to dense_vector.
So what we have right now with this PR should suffice IMO.
@ioanatia agreed - I was just thinking about how COALESCE allows users to have vector elements with different dimensions, something that the mapping guarantees us not having. I already approved and agreed to keep it this way - makes no sense to complicate this and we can always add some validation layer afterwards. |
…cations * upstream/main: (56 commits) Mute org.elasticsearch.compute.lucene.read.ValueSourceReaderTypeConversionTests testLoadAll elastic#143471 [DOCS] Fix ES|QL function and commands lists versioning metadata (elastic#143402) Fix MMROperatorTests (elastic#143453) Fix CSV-escaped quotes in generated docs examples (elastic#143449) Fix SQL client parsing of array header values (elastic#143408) ESQL: Add extended distribution tests and fault injection for external sources (elastic#143420) ESQL: Fix datasource test failures on Windows and FIPS (elastic#143417) Add circuit breaker for query construction to prevent OOM from automaton-based queries (elastic#142150) Cleanup SpecIT logging configuration (elastic#143365) ESQL: Prune unused regex extract nodes in optimizer (elastic#140982) Ensure supported locale outside of Entitlements check (elastic#143405) feat(es|ql): add dense_vector support in coalesce (elastic#142974) [Test] Unmute SnapshotStressTestsIT (elastic#143359) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.LookupJoinWithCoalesceFilterOnRight} elastic#143443 Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndex} elastic#143442 ESQL: Fix CCS exchange sink cleanup (elastic#143325) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndexAfterStats} elastic#143434 Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyFromRow} elastic#143432 Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:k8s-timeseries.Datenanos_derivative_compared_to_rate} elastic#143431 Mute org.elasticsearch.multiproject.test.CoreWithMultipleProjectsClientYamlTestSuiteIT test {yaml=search.retrievers/result-diversification/10_mmr_result_diversification_retriever/Test MMR result diversification single index float type} elastic#143430 ...
Closes #139928
Details:
This PR adds
dense_vectorsupport toCOALSESCE.Limitations:
dense_vector, so passing in an array would require ato_dense_vector()function. I will address Implicit casting in a follow up PRNote to reviewer:
Example requests:
Result: