Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introducing configurable retrieval workflows #3227

Merged
merged 99 commits into from
Sep 23, 2024
Merged

Conversation

jacopo-chevallard
Copy link
Collaborator

@jacopo-chevallard jacopo-chevallard commented Sep 18, 2024

Description

Major PR which, among other things, introduces the possibility of easily customizing the retrieval workflows. Workflows are based on LangGraph, and can be customized using a yaml configuration file, and adding the implementation of the nodes logic into quivr_rag_langgraph.py

This is a first, simple implementation that will significantly evolve in the coming weeks to enable more complex workflows (for instance, with conditional nodes). We also plan to adopt a similar approach for the ingestion part, i.e. to enable user to easily customize the ingestion pipeline.

Closes CORE-195, CORE-203, CORE-204

Checklist before requesting a review

Please delete options that are not relevant.

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented hard-to-understand areas
  • I have ideally added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Screenshots (if appropriate):

…, you need to set the env variables COHERE_API_KEY or JINA_API_KEY. If both are present, the Cohere reranker (rerank-multilingual-v3.0) is used.
…configuration fields of RAGConfig into RetrievalConfig
…itioning from QuivrQARAG to QuivrQARAGLangGraph
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 23, 2024
AmineDiro
AmineDiro previously approved these changes Sep 23, 2024
Copy link
Collaborator

@AmineDiro AmineDiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR !

if llm_config.tokenizer_hub:
# To prevent the warning
# huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
os.environ["TOKENIZERS_PARALLELISM"] = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely remove it](https://docs.rs/tokenizers/latest/src/tokenizers/utils/parallelism.rs.html#29)

You're right we keep it it defaults to true .

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 23, 2024
@StanGirard StanGirard merged commit ef90e8e into main Sep 23, 2024
10 checks passed
@StanGirard StanGirard deleted the feat/workflows branch September 23, 2024 16:11
StanGirard added a commit that referenced this pull request Sep 30, 2024
The code changes in `sync.py` update the Redis client configuration to use the `CELERY_BROKER_URL` environment variable. This allows for more flexibility in specifying the Redis connection details.

Recent user commits:
- chore: update Dockerfile dependencies and copy files
- feat: update sync_user.py to include sync user id in aget_files call (#3276)
- fix: add sync user id aget_files (#3261)
- fix: correctly passing the prompt provided by the user (#3252)
- fix(frontend): sync folder for premium users (#3251)
- fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable (#3247)
- feat: introducing configurable retrieval workflows (#3227)
- feat: remove n+1 query knowledge (#3238)
- fix(migration): order migrations (#3231)
- feat(integration): Notion (#3173)

Recent repository commits:
- chore: update Dockerfile dependencies and copy files
- feat: update sync_user.py to include sync user id in aget_files call (#3276)
- fix: add sync user id aget_files (#3261)
- fix: correctly passing the prompt provided by the user (#3252)
- fix(frontend): sync folder for premium users (#3251)
- fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable (#3247)
- feat: introducing configurable retrieval workflows (#3227)
- feat: remove n+1 query knowledge (#3238)
- fix(migration): order migrations (#3231)
- feat(integration): Notion (#3173)

```
StanGirard added a commit that referenced this pull request Sep 30, 2024
🤖 I have created a release *beep* *boop*
---


## 0.0.316 (2024-09-30)

## What's Changed
* fix(core): enforce langchain <0.3 for pydantic v1 by @StanGirard in
#3217
* chore(main): release core 0.0.16 by @StanGirard in
#3218
* feat(assistants): mock api by @StanGirard in
#3195
* chore(examples): fix chainlit example by @StanGirard in
#3223
* feat(integration): Notion by @chloedia in
#3173
* fix(migration): order migrations by @StanGirard in
#3231
* feat: remove n+1 query knowledge by @AmineDiro in
#3238
* feat: introducing configurable retrieval workflows by
@jacopo-chevallard in #3227
* fix: wrong default path for CHAT_LLM_CONFIG_PATH env variable by
@jacopo-chevallard in #3247
* fix(frontend): sync folder for premium users by @Zewed in
#3251
* fix: correctly passing the prompt provided by the user by
@jacopo-chevallard in #3252
* fix: add sync user id aget_files by @AmineDiro in
#3261
* feat: update sync_user.py to include sync user id in aget_files call
by @StanGirard in #3276
* chore: update Dockerfile dependencies and copy files by @StanGirard in
#3277
* chore: add wget to Dockerfile dependencies by @StanGirard in
#3279


**Full Changelog**:
v0.0.315...v0.0.316

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
StanGirard added a commit that referenced this pull request Oct 16, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.0.17](core-0.0.16...core-0.0.17)
(2024-10-16)


### Features

* **assistant:** cdp
([#3305](#3305))
([b767f19](b767f19))
* **assistants:** mock api
([#3195](#3195))
([282fa0e](282fa0e))
* introducing configurable retrieval workflows
([#3227](#3227))
([ef90e8e](ef90e8e))


### Bug Fixes

* fixing pdf parsing
([#3349](#3349))
([367242a](367242a))


### Documentation

* **core:** init ([#3365](#3365))
([bb572a2](bb572a2))
* **fix:** fixed warnings from griffe
([#3381](#3381))
([1a38798](1a38798))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Related to backend functionality or under the /backend directory lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants