Graphrag integration #4612

lspinheiro · 2024-12-09T07:29:43Z

Why are these changes needed?

This PR adds initial integration between graphrag and autogen by exposing local and global search as tools that can be used in autogen-agentchat. To be followed up with a user-guide/cookbook. I I added no tests because the test data I used was fairly large and I'm not sure we have a stablished way to add tests for those more complex integrations but there is a script below that I used. The indexing needs to be done in graphrag first, the goal is to illustrate the e2e steps in a notebook.

Would appreciate some initial feedback, hoping to gradually extend with more flexible configuration, integration of drift search and examples.

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

rysweet · 2024-12-10T17:21:15Z

hi @lspinheiro - this is exciting. its also marked as DRAFT in the subject line but not marked as such in the PR - I'm marking as draft and please set it back by clicking Ready to Review when you are ready.

ekzhu · 2024-12-12T01:00:26Z

Exciting to see this!! I love the tool idea. The tool itself can also be stateful and shared by multiple agents.

lspinheiro · 2024-12-17T06:20:07Z

Thanks @ekzhu and @rysweet . This should be ready for review now. Still needs improvements as mentioned in the description, but the tools can be used. I used the following test script.

import asyncio
from autogen_core import CancellationToken
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_ext.tools.graphrag import (
    GlobalSearchTool,
    LocalSearchTool,
    GlobalDataConfig,
    LocalDataConfig,
    EmbeddingConfig,
)
from azure.identity import DefaultAzureCredential, get_bearer_token_provider


async def main():
    openai_client = AzureOpenAIChatCompletionClient(
        model="gpt-4o-mini",
        azure_endpoint="https://<resource-name>.openai.azure.com", 
        azure_deployment="gpt-4o-mini",
        api_version="2024-08-01-preview",
        azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
    )

    # Global search example
    global_config = GlobalDataConfig(
        input_dir="./autogen-test/ragtest/output"
    )
    
    global_tool = GlobalSearchTool.from_config(
        openai_client=openai_client,
        data_config=global_config
    )

    global_args = {
        "query": "What does the station-master says about Dr. Becher?"
    }

    global_result = await global_tool.run_json(global_args, CancellationToken())
    print("\nGlobal Search Result:")
    print(global_result)
    
    # Local search example
    local_config = LocalDataConfig(
        input_dir="./autogen-test/ragtest/output"
    )

    embedding_config = EmbeddingConfig(
        model="text-embedding-3-small",
        api_base="https://<resource-name>.openai.azure.com", 
        deployment_name="text-embedding-3-small",
        api_version="2023-05-15",
        api_type="azure",
        azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"),
        max_retries=10,
        request_timeout=180.0,
    )

    local_tool = LocalSearchTool.from_config(
        openai_client=openai_client,
        data_config=local_config,
        embedding_config=embedding_config
    )

    local_args = {
        "query": "What does the station-master says about Dr. Becher?"
    }

    local_result = await local_tool.run_json(local_args, CancellationToken())
    print("\nLocal Search Result:")
    print(local_result)


if __name__ == "__main__":
    asyncio.run(main())

lspinheiro · 2024-12-17T06:22:58Z

@jackgerrits , I had to add verride-dependencies for pydantic and tenacity because the current version of pydantic is below their minimum requirement and there is a conflict with llamaindex which requires a lower version of tenancity, but it is a dev dependency for us. Let me know if you have any concerns with the approach

gagb · 2024-12-17T18:21:53Z

Thank you! More documentation would help me review this PR. I would like to be able to build the docs page on this PR and see the example.

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_model_adapter.py

gagb · 2024-12-19T18:19:53Z

Related #4438

lspinheiro · 2024-12-20T02:23:34Z

Thank you! More documentation would help me review this PR. I would like to be able to build the docs page on this PR and see the example.

@gagb , I added a sample with a readme and some docstrings that should help with the review.

python/samples/agentchat_graphrag/requirements.txt

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py

codecov · 2025-01-03T23:12:26Z

Codecov Report

Attention: Patch coverage is 0% with 161 lines in your changes missing coverage. Please review.

Project coverage is 67.11%. Comparing base (e168616) to head (e60a9aa).

Files with missing lines	Patch %	Lines
...ogen-ext/src/autogen_ext/tools/graphrag/_config.py	0.00%	55 Missing ⚠️
...xt/src/autogen_ext/tools/graphrag/_local_search.py	0.00%	54 Missing ⚠️
...t/src/autogen_ext/tools/graphrag/_global_search.py	0.00%	48 Missing ⚠️
...gen-ext/src/autogen_ext/tools/graphrag/__init__.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4612      +/-   ##
==========================================
- Coverage   68.21%   67.11%   -1.10%     
==========================================
  Files         158      162       +4     
  Lines        9960    10121     +161     
==========================================
- Hits         6794     6793       -1     
- Misses       3166     3328     +162

Flag	Coverage Δ
unittests	`67.11% <0.00%> (-1.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py

…obal_search.py Co-authored-by: Eric Zhu <[email protected]>

…cal_search.py Co-authored-by: Eric Zhu <[email protected]>

ekzhu · 2025-01-04T08:30:12Z

Let's add some unit tests? See the code coverage result. Is it possible to run a simple set up procedure with mini data set, perhaps generated?

lpinheiroms added 3 commits December 7, 2024 17:10

add initial global search draft

e3e8f45

add graphrag dep

8242378

Merge branch 'main' into lpinheiro/feat/add-graphrag-tools

fb2fb19

rysweet marked this pull request as draft December 10, 2024 17:21

ekzhu added rag retrieve-augmented generative agents proj-extensions labels Dec 12, 2024

lpinheiroms added 3 commits December 17, 2024 11:23

fix local search embedding

a13c18b

linting

8f3c484

add from config constructor

0c05047

lspinheiro requested a review from jackgerrits December 17, 2024 06:06

Merge branch 'main' into lpinheiro/feat/add-graphrag-tools

0e53f91

lspinheiro requested a review from ekzhu December 17, 2024 06:10

remove draft notebook

c1e7ea2

lspinheiro marked this pull request as ready for review December 17, 2024 06:17

rysweet changed the title ~~[DRAFT] Graphrag integration~~ Graphrag integration Dec 17, 2024

gagb mentioned this pull request Dec 17, 2024

RAG Agent in agentchat #4742

Open

ekzhu reviewed Dec 17, 2024

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_model_adapter.py Outdated Show resolved Hide resolved

Merge branch 'main' into lpinheiro/feat/add-graphrag-tools

a8b38ad

lpinheiroms added 4 commits December 20, 2024 11:13

update config factory and add docstrings

6d61c8e

add graphrag sample

1c4ed3d

add sample prompts

95f329c

update readme

3bc104b

lspinheiro requested a review from gagb December 20, 2024 02:15

lspinheiro and others added 2 commits December 20, 2024 12:23

Merge branch 'main' into lpinheiro/feat/add-graphrag-tools

2ae6812

update deps

33523df

lspinheiro requested a review from ekzhu December 30, 2024 01:26

ekzhu reviewed Dec 30, 2024

View reviewed changes

python/samples/agentchat_graphrag/requirements.txt Outdated Show resolved Hide resolved

ekzhu reviewed Dec 30, 2024

View reviewed changes

python/samples/agentchat_graphrag/requirements.txt Outdated Show resolved Hide resolved

ekzhu reviewed Dec 30, 2024

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py Outdated Show resolved Hide resolved

Add API docs

8080ddb

ekzhu reviewed Dec 30, 2024

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py Show resolved Hide resolved

ekzhu and others added 5 commits December 29, 2024 21:25

Update python/samples/agentchat_graphrag/requirements.txt

603c1c9

Update python/samples/agentchat_graphrag/requirements.txt

934230b

merge main, fix conflicts

1c5fcd3

update docstrings with snippet and doc ref

4f0c71f

lint

e3dc1f9

ekzhu reviewed Dec 30, 2024

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py Show resolved Hide resolved

lpinheiroms added 4 commits January 4, 2025 08:31

improve set up instructions in docstring

f24fb6c

lint

4a5d611

Merge branch 'main' into lpinheiro/feat/add-graphrag-tools

74a2a23

update lock

cac2aef

lspinheiro force-pushed the lpinheiro/feat/add-graphrag-tools branch from e4e2b52 to cac2aef Compare January 3, 2025 23:10

ekzhu reviewed Jan 4, 2025

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py Outdated Show resolved Hide resolved

ekzhu reviewed Jan 4, 2025

View reviewed changes

python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py Outdated Show resolved Hide resolved

lspinheiro and others added 2 commits January 4, 2025 17:10

Update python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_gl…

e42f027

…obal_search.py Co-authored-by: Eric Zhu <[email protected]>

Update python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_lo…

e60a9aa

…cal_search.py Co-authored-by: Eric Zhu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphrag integration #4612

Graphrag integration #4612

lspinheiro commented Dec 9, 2024 •

edited

Loading

rysweet commented Dec 10, 2024

ekzhu commented Dec 12, 2024

lspinheiro commented Dec 17, 2024

lspinheiro commented Dec 17, 2024

gagb commented Dec 17, 2024

gagb commented Dec 19, 2024

lspinheiro commented Dec 20, 2024

codecov bot commented Jan 3, 2025 •

edited

Loading

ekzhu commented Jan 4, 2025

Graphrag integration #4612

Are you sure you want to change the base?

Graphrag integration #4612

Conversation

lspinheiro commented Dec 9, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

rysweet commented Dec 10, 2024

ekzhu commented Dec 12, 2024

lspinheiro commented Dec 17, 2024

lspinheiro commented Dec 17, 2024

gagb commented Dec 17, 2024

gagb commented Dec 19, 2024

lspinheiro commented Dec 20, 2024

codecov bot commented Jan 3, 2025 • edited Loading

Codecov Report

ekzhu commented Jan 4, 2025

lspinheiro commented Dec 9, 2024 •

edited

Loading

codecov bot commented Jan 3, 2025 •

edited

Loading