Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

o3 #1098

Open
guiramos opened this issue Feb 3, 2025 · 9 comments
Open

o3 #1098

guiramos opened this issue Feb 3, 2025 · 9 comments

Comments

@guiramos
Copy link

guiramos commented Feb 3, 2025

Hello, how to properly use with o3?

I am still doing a custom json with all the parameters:

{
  "RETRIEVER": "tavily",
  "EMBEDDING": "openai:text-embedding-3-small",
  "SIMILARITY_THRESHOLD": 0.6,
  "FAST_LLM": "openai:gpt-4o-mini",
  "SMART_LLM": "openai:gpt-4o-2024-08-06",
  "STRATEGIC_LLM": "openai:gpt-4o-2024-08-06",
  "FAST_TOKEN_LIMIT": 2000,
  "SMART_TOKEN_LIMIT": 4000,
  "BROWSE_CHUNK_MAX_LENGTH": 8192,
  "SUMMARY_TOKEN_LIMIT": 1000,
  "TEMPERATURE": 0.5,
  "LLM_TEMPERATURE": 0.55,
  "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0",
  "MAX_SEARCH_RESULTS_PER_QUERY": 5,
  "MEMORY_BACKEND": "local",
  "TOTAL_WORDS": 1200,
  "REPORT_FORMAT": "APA",
  "MAX_ITERATIONS": 4,
  "SCRAPER": "bs",
  "MAX_SUBTOPICS": 3,
  "DOC_PATH": "./my-docs"
}

And then doing this:

researcher = GPTResearcher(query=query, report_type=REPORT_TYPE, config_path=f"{APP_PATH}/agent/resources/researcher.json")
researcher.cfg.total_words = total_words
researcher.cfg.smart_token_limit = total_words * 4
await researcher.conduct_research()
report = await researcher.write_report()
return report

Is there a better way to customize this? I've asked this question in the past about the o1 usage but lost track of its evolution.

#996

@assafelovic

@guiramos
Copy link
Author

guiramos commented Feb 4, 2025

The problem is that o3-mini does not accept the temperature argument and it's hardcoded in this block:

try:
        response = await create_chat_completion(
            model=cfg.smart_llm_model,
            messages=[
                {"role": "system", "content": f"{auto_agent_instructions()}"},
                {"role": "user", "content": f"task: {query}"},
            ],
            temperature=0.15,
            llm_provider=cfg.smart_llm_provider,
            llm_kwargs=cfg.llm_kwargs,
            cost_callback=cost_callback,
        )

        agent_dict = json.loads(response)
        return agent_dict["server"], agent_dict["agent_role_prompt"]

    except Exception as e:
        print("⚠️ Error in reading JSON, attempting to repair JSON")
        return await handle_json_error(response)

@assafelovic
Copy link
Owner

Hey @guiramos for now we've introduced o models for the STRATEGIC_LLM only, which is in charge of planning the research. Tbh I think anywhere else is an overkill for the task. Check here for more details about modifying the config: https://docs.gptr.dev/docs/gpt-researcher/gptr/config

@guiramos
Copy link
Author

guiramos commented Feb 4, 2025

@assafelovic ok, that makes sense. Thank you.

@guiramos guiramos closed this as completed Feb 4, 2025
@regismesquita
Copy link
Contributor

regismesquita commented Feb 12, 2025

@assafelovic 4o is more expensive than o3-mini (Double the price), even if it is a overkill it might still be a better deal?

@guiramos
Copy link
Author

That is a very good point.

However the way that o3 handles/interprets the prompts is different, it will cause impact in the code

@guiramos guiramos reopened this Feb 12, 2025
@regismesquita
Copy link
Contributor

regismesquita commented Feb 12, 2025

yes, there is some impact, to make it work locally I had to remove the temperature settings and add the reasoning effort (I am using o3-mini on all llm's fast and smart is low and strategic is high)

My hacky local patch

 % git diff master -- gpt_researcher --patch | cat
diff --git a/gpt_researcher/actions/query_processing.py b/gpt_researcher/actions/query_processing.py
index 3b08da54..062e7e7f 100644
--- a/gpt_researcher/actions/query_processing.py
+++ b/gpt_researcher/actions/query_processing.py
@@ -60,6 +60,7 @@ async def generate_sub_queries(
             llm_provider=cfg.strategic_llm_provider,
             max_tokens=None,
             llm_kwargs=cfg.llm_kwargs,
+            reasoning_effort="high",
             cost_callback=cost_callback,
         )
     except Exception as e:
diff --git a/gpt_researcher/actions/report_generation.py b/gpt_researcher/actions/report_generation.py
index 98cb7f9d..d3acf23c 100644
--- a/gpt_researcher/actions/report_generation.py
+++ b/gpt_researcher/actions/report_generation.py
@@ -47,7 +47,7 @@ async def write_report_introduction(
                     language=config.language
                 )},
             ],
-            temperature=0.25,
+            #temperature=0.25,
             llm_provider=config.smart_llm_provider,
             stream=True,
             websocket=websocket,
@@ -92,7 +92,7 @@ async def write_conclusion(
                                                                        report_content=context,
                                                                        language=config.language)},
             ],
-            temperature=0.25,
+            #temperature=0.25,
             llm_provider=config.smart_llm_provider,
             stream=True,
             websocket=websocket,
@@ -135,7 +135,7 @@ async def summarize_url(
                 {"role": "system", "content": f"{role}"},
                 {"role": "user", "content": f"Summarize the following content from {url}:\n\n{content}"},
             ],
-            temperature=0.25,
+            #temperature=0.25,
             llm_provider=config.smart_llm_provider,
             stream=True,
             websocket=websocket,
@@ -180,7 +180,7 @@ async def generate_draft_section_titles(
                 {"role": "user", "content": generate_draft_titles_prompt(
                     current_subtopic, query, context)},
             ],
-            temperature=0.25,
+            #temperature=0.25,
             llm_provider=config.smart_llm_provider,
             stream=True,
             websocket=None,
@@ -242,7 +242,7 @@ async def generate_report(
                 {"role": "system", "content": f"{agent_role_prompt}"},
                 {"role": "user", "content": content},
             ],
-            temperature=0.35,
+            #temperature=0.35,
             llm_provider=cfg.smart_llm_provider,
             stream=True,
             websocket=websocket,
@@ -257,7 +257,7 @@ async def generate_report(
                 messages=[
                     {"role": "user", "content": f"{agent_role_prompt}\n\n{content}"},
                 ],
-                temperature=0.35,
+                #temperature=0.35,
                 llm_provider=cfg.smart_llm_provider,
                 stream=True,
                 websocket=websocket,
diff --git a/gpt_researcher/utils/llm.py b/gpt_researcher/utils/llm.py
index 611e065a..deac0db0 100644
--- a/gpt_researcher/utils/llm.py
+++ b/gpt_researcher/utils/llm.py
@@ -28,7 +28,8 @@ async def create_chat_completion(
         stream: Optional[bool] = False,
         websocket: Any | None = None,
         llm_kwargs: Dict[str, Any] | None = None,
-        cost_callback: callable = None
+        cost_callback: callable = None,
+        reasoning_effort: Optional[str] = "low"
 ) -> str:
     """Create a chat completion using the OpenAI API
     Args:
@@ -51,7 +52,8 @@ async def create_chat_completion(
             f"Max tokens cannot be more than 16,000, but got {max_tokens}")

     # Get the provider from supported providers
-    provider = get_llm(llm_provider, model=model, temperature=temperature,
+    provider = get_llm(llm_provider, model=model, #temperature=temperature,
+                       reasoning_effort=reasoning_effort,
                        max_tokens=max_tokens, **(llm_kwargs or {}))

     response = ""
@@ -101,8 +103,9 @@ async def construct_subtopics(task: str, data: str, config, subtopics: list = []
         provider = get_llm(
             config.smart_llm_provider,
             model=config.smart_llm_model,
-            temperature=temperature,
+            #temperature=temperature,
             max_tokens=config.smart_token_limit,
+            reasoning_effort="high",
             **config.llm_kwargs,
         )
         model = provider.llm

and start with:

FAST_LLM="openai:o3-mini" SMART_LLM="openai:o3-mini" STRATEGIC_LLM="openai:o3-mini"    python -m uvicorn main:app --host 0.0.0.0 --reload

Edit: So some work would be necessary to add support to o3-mini while not breaking support to everything else.

@sassanix
Copy link

Not all of us have access to o3, so this works just fine for my use case at the moment.

@regismesquita
Copy link
Contributor

Looking back now , I could probably check if the model name contains o3 and just change the llm.py if it does to omit the temperature and add the reasoning effor field.

@regismesquita
Copy link
Contributor

regismesquita commented Feb 12, 2025

This is much cleaner and should support current existing models 🤔

 % git diff --staged master -- gpt_researcher | cat
diff --git a/gpt_researcher/actions/query_processing.py b/gpt_researcher/actions/query_processing.py
index 3b08da54..062e7e7f 100644
--- a/gpt_researcher/actions/query_processing.py
+++ b/gpt_researcher/actions/query_processing.py
@@ -60,6 +60,7 @@ async def generate_sub_queries(
             llm_provider=cfg.strategic_llm_provider,
             max_tokens=None,
             llm_kwargs=cfg.llm_kwargs,
+            reasoning_effort="high",
             cost_callback=cost_callback,
         )
     except Exception as e:
diff --git a/gpt_researcher/utils/llm.py b/gpt_researcher/utils/llm.py
index 611e065a..bab610e2 100644
--- a/gpt_researcher/utils/llm.py
+++ b/gpt_researcher/utils/llm.py
@@ -28,7 +28,8 @@ async def create_chat_completion(
         stream: Optional[bool] = False,
         websocket: Any | None = None,
         llm_kwargs: Dict[str, Any] | None = None,
-        cost_callback: callable = None
+        cost_callback: callable = None,
+        reasoning_effort: Optional[str] = "low"
 ) -> str:
     """Create a chat completion using the OpenAI API
     Args:
@@ -51,9 +52,18 @@ async def create_chat_completion(
             f"Max tokens cannot be more than 16,000, but got {max_tokens}")
 
     # Get the provider from supported providers
-    provider = get_llm(llm_provider, model=model, temperature=temperature,
-                       max_tokens=max_tokens, **(llm_kwargs or {}))
-
+    kwargs = {
+        'model': model,
+        'max_tokens': max_tokens,
+        **(llm_kwargs or {})
+    }
+
+    if 'o3' in model:
+        kwargs['reasoning_effort'] = reasoning_effort
+    else:
+        kwargs['temperature'] = temperature
+
+    provider = get_llm(llm_provider, **kwargs)
     response = ""
     # create response
     for _ in range(10):  # maximum of 10 attempts
@@ -103,6 +113,7 @@ async def construct_subtopics(task: str, data: str, config, subtopics: list = []
             model=config.smart_llm_model,
             temperature=temperature,
             max_tokens=config.smart_token_limit,
+            reasoning_effort="high",
             **config.llm_kwargs,
         )
         model = provider.llm

You might want to make it possible for people for select the reasoning effort for each LLM but this at least allows people to use o3 for now.

Edit: Interestingly o3-mini is also much faster than 4o for a simple research it completed the task in a third of the time.

edit2: nvm, this just failed on the detailed report inside of the runnable sequence (Construct subtopics), the fix is easy just replicate this logic on the generate subtopics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants