Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Semantic Search Feature #305

Open
devleejb opened this issue Aug 23, 2024 · 4 comments
Open

Add Semantic Search Feature #305

devleejb opened this issue Aug 23, 2024 · 4 comments
Assignees
Labels
enhancement 🌟 New feature or request

Comments

@devleejb
Copy link
Member

What would you like to be added:

I propose to add a Semantic Search feature that enhances the ability to search and retrieve documents semantically. This functionality could be beneficial for users looking to improve the relevancy of search results beyond traditional keyword matching. The conceptual architecture and workflow are illustrated in the images included.

Key Decisions Needed:

  1. When to save/update documents in the Vector Store?

    • Options:
      • Every time a document is updated
      • Periodically through a Cron Job
      • After a set duration without updates (e.g., 10 minutes)
      • Initially embed large documents, then embed smaller updates, with periodic consolidation.
  2. How to store existing data in the Vector Store during feature deployment?

  3. Chunking Strategy:

    • Different chunking methods have advantages and disadvantages, including:
      • Parent-Child Chunking
      • Fixed Chunking
      • Other strategies
  4. Embedding Model:

    • What model should we use for embedding?
    • It may be costly to rely on commercial models like OpenAI due to frequent embedding needs.
    • Exploring options like Ollama or smaller models could be sufficient.
  5. Vector Store Considerations:

    • Recommendations for potential Vector Stores:
      • Milvus (29k)
      • Weviate (10k)
      • Chroma (14k)
      • Faiss (30k)
    • Need for features like Namespace to support separation by Workspace for better data management.

Why is this needed:

Integrating a Semantic Search feature will significantly enhance user experience by providing more relevant and efficient search capabilities.

Additional Information:

  • Relevant references must be gathered for informed decision-making.
@devleejb
Copy link
Member Author

image

@devleejb devleejb added this to the v0.1.4 milestone Aug 23, 2024
@devleejb devleejb moved this from Backlog to In progress in Yorkie Project - 2024 Aug 24, 2024
@devleejb devleejb moved this from Backlog to In progress in CodePair Aug 24, 2024
@devleejb devleejb modified the milestones: v0.1.4, v0.1.5, v0.1.6, v0.1.7 Aug 24, 2024
@devleejb devleejb removed this from the v0.1.7 milestone Aug 30, 2024
@devleejb
Copy link
Member Author

This feature can be useful for resolving this issue: yorkie-team/yorkie#1002

@devleejb
Copy link
Member Author

@sihyeong671 Could you check this comment(yorkie-team/yorkie#1002 (comment))? Doc event webhook is useful to implement semantic search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request
Projects
Status: In progress
Status: In progress
Development

No branches or pull requests

2 participants