-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introducing configurable retrieval workflows #3227
Conversation
…, you need to set the env variables COHERE_API_KEY or JINA_API_KEY. If both are present, the Cohere reranker (rerank-multilingual-v3.0) is used.
…iguration classes
…for retrieval, ingestion, parsing, etc.
…opment of more advanced ingestion pipelines
…the use of rerankers
…configuration fields of RAGConfig into RetrievalConfig
…itioning from QuivrQARAG to QuivrQARAGLangGraph
…valConfig instead of AssistantConfig
…kens and max_input --> max_input_tokens
…stringifying the list of floats returned from PG
…o the yaml configuration file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR !
if llm_config.tokenizer_hub: | ||
# To prevent the warning | ||
# huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | ||
os.environ["TOKENIZERS_PARALLELISM"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can safely remove it](https://docs.rs/tokenizers/latest/src/tokenizers/utils/parallelism.rs.html#29)
You're right we keep it it defaults to true .
…no the corresponding env variable is not set
The code changes in `sync.py` update the Redis client configuration to use the `CELERY_BROKER_URL` environment variable. This allows for more flexibility in specifying the Redis connection details. Recent user commits: - chore: update Dockerfile dependencies and copy files - feat: update sync_user.py to include sync user id in aget_files call (#3276) - fix: add sync user id aget_files (#3261) - fix: correctly passing the prompt provided by the user (#3252) - fix(frontend): sync folder for premium users (#3251) - fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable (#3247) - feat: introducing configurable retrieval workflows (#3227) - feat: remove n+1 query knowledge (#3238) - fix(migration): order migrations (#3231) - feat(integration): Notion (#3173) Recent repository commits: - chore: update Dockerfile dependencies and copy files - feat: update sync_user.py to include sync user id in aget_files call (#3276) - fix: add sync user id aget_files (#3261) - fix: correctly passing the prompt provided by the user (#3252) - fix(frontend): sync folder for premium users (#3251) - fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable (#3247) - feat: introducing configurable retrieval workflows (#3227) - feat: remove n+1 query knowledge (#3238) - fix(migration): order migrations (#3231) - feat(integration): Notion (#3173) ```
🤖 I have created a release *beep* *boop* --- ## 0.0.316 (2024-09-30) ## What's Changed * fix(core): enforce langchain <0.3 for pydantic v1 by @StanGirard in #3217 * chore(main): release core 0.0.16 by @StanGirard in #3218 * feat(assistants): mock api by @StanGirard in #3195 * chore(examples): fix chainlit example by @StanGirard in #3223 * feat(integration): Notion by @chloedia in #3173 * fix(migration): order migrations by @StanGirard in #3231 * feat: remove n+1 query knowledge by @AmineDiro in #3238 * feat: introducing configurable retrieval workflows by @jacopo-chevallard in #3227 * fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable by @jacopo-chevallard in #3247 * fix(frontend): sync folder for premium users by @Zewed in #3251 * fix: correctly passing the prompt provided by the user by @jacopo-chevallard in #3252 * fix: add sync user id aget_files by @AmineDiro in #3261 * feat: update sync_user.py to include sync user id in aget_files call by @StanGirard in #3276 * chore: update Dockerfile dependencies and copy files by @StanGirard in #3277 * chore: add wget to Dockerfile dependencies by @StanGirard in #3279 **Full Changelog**: v0.0.315...v0.0.316 --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.0.17](core-0.0.16...core-0.0.17) (2024-10-16) ### Features * **assistant:** cdp ([#3305](#3305)) ([b767f19](b767f19)) * **assistants:** mock api ([#3195](#3195)) ([282fa0e](282fa0e)) * introducing configurable retrieval workflows ([#3227](#3227)) ([ef90e8e](ef90e8e)) ### Bug Fixes * fixing pdf parsing ([#3349](#3349)) ([367242a](367242a)) ### Documentation * **core:** init ([#3365](#3365)) ([bb572a2](bb572a2)) * **fix:** fixed warnings from griffe ([#3381](#3381)) ([1a38798](1a38798)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
Major PR which, among other things, introduces the possibility of easily customizing the retrieval workflows. Workflows are based on LangGraph, and can be customized using a yaml configuration file, and adding the implementation of the nodes logic into quivr_rag_langgraph.py
This is a first, simple implementation that will significantly evolve in the coming weeks to enable more complex workflows (for instance, with conditional nodes). We also plan to adopt a similar approach for the ingestion part, i.e. to enable user to easily customize the ingestion pipeline.
Closes CORE-195, CORE-203, CORE-204
Checklist before requesting a review
Please delete options that are not relevant.
Screenshots (if appropriate):