Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Callback Support for Research Progress Monitoring #1129

Open
us opened this issue Feb 11, 2025 · 1 comment
Open

Add Callback Support for Research Progress Monitoring #1129

us opened this issue Feb 11, 2025 · 1 comment

Comments

@us
Copy link

us commented Feb 11, 2025

Description

Currently, GPTResearcher uses a custom WebSocket handler for logging research progress. While this works for simple use cases, it would be more flexible to support standard callback mechanisms similar to other LangChain tools. This would allow better integration with existing logging systems and more granular control over the research process.

Current Implementation

class CustomLogsHandler:
    def __init__(self):
        self.logs = []
        
    async def send_json(self, data: Dict[str, Any]) -> None:
        self.logs.append(data)
        print(f"My custom Log: {data}")

Proposed Enhancement

Add support for LangChain-style callbacks in the GPTResearcher class:

from typing import Dict, Any, Optional, List
from langchain.callbacks.base import BaseCallbackHandler

class GPTResearcher:
    def __init__(
        self,
        query: str,
        report_type: str = "research_report",
        report_source: str = "online",
        tone: str = "informative",
        config_path: Optional[str] = None,
        websocket: Optional[Any] = None,  # Keep for backward compatibility
        callbacks: Optional[List[BaseCallbackHandler]] = None,  # New parameter
    ):
        self.callbacks = callbacks or []
        # ... existing initialization code ...

    async def _log_progress(self, data: Dict[str, Any]) -> None:
        # Support both websocket and callbacks
        if self.websocket:
            await self.websocket.send_json(data)
        
        # Notify all callbacks
        for callback in self.callbacks:
            if hasattr(callback, "on_research_step"):
                await callback.on_research_step(data)

Example usage with custom callback:

class ResearchProgressCallback(BaseCallbackHandler):
    def __init__(self):
        self.steps = []
    
    async def on_research_step(self, data: Dict[str, Any]) -> None:
        self.steps.append(data)
        print(f"Research progress: {data}")

# Usage
callback = ResearchProgressCallback()
researcher = GPTResearcher(
    query="What happened in the latest burning man floods?",
    callbacks=[callback]
)

Benefits

  1. Better integration with existing logging systems
  2. More granular control over research progress monitoring
  3. Support for multiple simultaneous callbacks
  4. Maintains backward compatibility with websocket approach

Implementation Notes

  • Add new callbacks parameter to GPTResearcher constructor
  • Create standard callback events for research steps
  • Document callback interface and events
  • Maintain backward compatibility with existing websocket approach

Questions

  • Should we deprecate the websocket approach in favor of callbacks?
  • What specific callback events should we standardize?
  • Should we add synchronous callback support as well?
@ElishaKay
Copy link
Collaborator

ElishaKay commented Feb 19, 2025

Sup @us
Apologies for the delayed response.

Happy to see the PR
I agree with the 4 points you made under "Implementation Notes".
Some thoughts:
a) Some nice documentation around the types of logs can also be a nice bonus for users
b) Some examples of using GPTR as a Langchain Tool could also be interesting.
We have something on the docs about Langgraph, but we could probably create a new section for Langchain, since GPTR leverages Langchain in a lot of interesting ways

Re: Questions:

Should we deprecate the websocket approach in favor of callbacks?

  • We can consider that in another PR - there's a good amount of docs around the websocket method

What specific callback events should we standardize?

If you run that example, you can group by the "content" field for unique types of logs.
For example:

{
    "type": "logs",
    "content": "added_source_url",
    "output": "✅ Added source url to research: https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery\n",
    "metadata": "https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery"
}

Should we add synchronous callback support as well?

  • not sure what you mean - we can begin with a lean PR on the gist of your idea and then take the rest in pieces based on further meditation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants