Implement KNN comparison with ESQL for so_vector#837
Implement KNN comparison with ESQL for so_vector#837svilen-mihaylov-elastic merged 13 commits intomasterfrom
Conversation
so_vector/track.py
Outdated
| "num_candidates": self._params.get("num_candidates", 50), | ||
| if self._in_esql_mode: | ||
| # Construct options JSON. | ||
| options_param = '{"num_candidates":' + str(self._num_candidates) |
There was a problem hiding this comment.
This will change to "min_candidates" in elastic/elasticsearch#132944. k will also be removed, and instead we will need to specify a LIMIT to the query.
It's not a blocker for this PR, but just a heads up that we will need to change this when the above gets merged.
carlosdelest
left a comment
There was a problem hiding this comment.
Looks good!
I think we can use param-source to have specific param handling for ESQL related operations, and use operation-type to help run specific ESQL workloads.
benwtrent
left a comment
There was a problem hiding this comment.
All these changes make sense to me. Its good to race the legacy API with esql.
My concern is that all these operations are part of the default operations. Are all the commands and esql actions fully released? If so, then good. If they are still behind a snapshot, I am not sure this can be merged.
Maybe add a "include_esql" tasks option that we can flip on/off for nightlies?
@benwtrent Thanks for the review. To make it more specific, are you proposing something like this to specify an operation (special is_esql option): Do I need to do something to specifically add this parameter? |
|
@svilen-mihaylov-elastic I am saying if this is not all available in release builds, it should be be part of the default operations. Maybe group all of |
so_vector/challenges/default.json
Outdated
| "iterations": 100, | ||
| "clients": 1 | ||
| }, | ||
| {% if is_esql_enabled %} |
There was a problem hiding this comment.
@benwtrent Decided to make each esql conditional to avoid complications with the serverless conditional checks towards the end of the file
There was a problem hiding this comment.
👍 I think you can group together all esql related operations and use a single if block.
Another option would be to create a different challenge for esql operations, but I don't see this being used consistently, and I think we will want to execute both esql and non esql for nightlies.
carlosdelest
left a comment
There was a problem hiding this comment.
LGTTM, thanks @svilen-mihaylov-elastic !
so_vector/challenges/default.json
Outdated
| "iterations": 100, | ||
| "clients": 1 | ||
| }, | ||
| {% if is_esql_enabled %} |
There was a problem hiding this comment.
👍 I think you can group together all esql related operations and use a single if block.
Another option would be to create a different challenge for esql operations, but I don't see this being used consistently, and I think we will want to execute both esql and non esql for nightlies.
* Implement KNN comparison with ESQL for so_vector (elastic#837) Update the so_vector rally track to also exercise knn against the ESQL frontend. * Add patterned_text_index_options parameter (elastic#840) Add patterned_text_index_options parameter to elastic/log tracks. Can accepts value of docs and positions, defaults to docs. Sets the index_options value of the message field in all indices. Only applies if patterned_text_message_field is set to true, and message fields are patterned_text, rather than match_only_text. * Fix so_vector for serverless operator (elastic#844) Misplaced comma * ES|QL: Add queries for LOOKUP JOIN with multiple join keys (elastic#838) * Add a few runtime fields to insist-chicken challenge (elastic#841) * Add elastic/logs patterned-text queries challenge (elastic#842) Add challenge which queries message field with several term and phrase queries. This is meant to test the patterned_text mapping type, but can also be used to test match_only_text message fields. * Reduce number of iterations for queries that use runtime fields. (elastic#846) * Remove routing_path from tsdb index template (elastic#847) Setting this is not necessary as the routing path is set automatically for data streams and this template defines `"data_stream": {}`. It would also prevent an optimization added in elastic/elasticsearch#132566. * Set index template for ingest_mode: data_stream (elastic#849) Rolls back changes in elastic#722 that broke the `ingest_mode: data_stream`. * ES|QL - so_vector knn function update (elastic#850) * Add missing p_index_mode param (elastic#853) --------- Co-authored-by: Svilen Mihaylov <svilen.mihaylov@elastic.co> Co-authored-by: Parker Timmins <parker.timmins@elastic.co> Co-authored-by: Evgenia Badiyanova <evgenia.badiyanova@elastic.co> Co-authored-by: Luigi Dell'Aquila <luigi.dellaquila@gmail.com> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com> Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
|
@svilen-mihaylov-elastic
Backporting entails:
Thank you! |
Update the so_vector rally track to also exercise knn against the ESQL frontend.