-
Notifications
You must be signed in to change notification settings - Fork 421
feat: add SDK support for search records with similarity #4023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SDK support for search records with similarity #4023
Conversation
…ure/add-sdk-search-records
…ure/add-sdk-search-records
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already mentioned here I think we should use a client schema indeed, not the SDK one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about using pydantic
schemas in the SDK instead of just the kwargs? @frascuchon @gabrielmbmb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about using pydantic
schemas in the SDK instead of kwargs? @frascuchon @gabrielmbmb
…ure/add-sdk-search-records
…de an implementation for both local and remote
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4023-ki24f765kq-no.a.run.app |
Codecov ReportAttention:
... and 10 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
<!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description This PR changes the `l2_norm` distance to the cosine similarity for vector search. This change can improve results on similarity searches and also for least similarity searches. This [PR](#4023) must be reviewed first Closes #4123 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor (change restructuring the codebase without changing functionality) - [X] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) The base dataset has been used with boh ElasticSearch and OpenSearch to verify this change. **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, I'll complete the review once those are addressed, nice job!
Co-authored-by: Alvaro Bartolome <[email protected]>
Co-authored-by: Alvaro Bartolome <[email protected]>
…ure/add-sdk-search-records
…o/argilla into feature/add-sdk-search-records
…/new-filter-area-ui * feature/feedback-dataset-semantic-similarity: feat: add sdk update records with vectors (#4128) ⚡️ Remove unused args ✨ Remove unused requests feat: add `delete_vectors_settings` method (#4130) feat: add SDK support for search records with similarity (#4023) ✨ No capitalized name for Fields, Questions, Metadata and Vectors tab. feat: add `update_vectors_settings` function (#4122)
Description
This PR suggest an implementation to give support to
/api/me/datasets/{dataset_id}/records/search
endpoint with vector similarity searching.Here is a full example working with similarity search from python SDK using no real 100% fake data:
Closes #4020
Type of change
How Has This Been Tested
Tested locally
Checklist