Merge branch 'edit-docs' of https://github.com/scchengaiah/gpt-resear…

…cher-fork into edit-docs # Conflicts: # docs/docs/gpt-researcher/llms/llms.md
ElishaKay · Oct 27, 2024 · 2d856ee · 2d856ee
2 parents 99af2cf + becaa70
commit 2d856ee
Show file tree

Hide file tree

Showing 46 changed files with 900 additions and 364 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,42 +1,42 @@
 # Contributing to GPT Researcher
-First off, we'd like to welcome you, and thank you for your interest and effort in contributing to our open-source project ❤️. Contributions of all forms are welcome, from new features and bug fixes, to documentation and more. 
 
-We are on a mission to build the #1 AI agent for comprehensive, unbiased, and factual research online. And we need your support to achieve this grand vision. 
+First off, we'd like to welcome you and thank you for your interest and effort in contributing to our open-source project ❤️. Contributions of all forms are welcome—from new features and bug fixes to documentation and more.
 
-Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved.
+We are on a mission to build the #1 AI agent for comprehensive, unbiased, and factual research online, and we need your support to achieve this grand vision.
+
+Please take a moment to review this document to make the contribution process easy and effective for everyone involved.
 
 ## Reporting Issues
 
 If you come across any issue or have an idea for an improvement, don't hesitate to create an issue on GitHub. Describe your problem in sufficient detail, providing as much relevant information as possible. This way, we can reproduce the issue before attempting to fix it or respond appropriately.
 
 ## Contributing Code
 
-1. **Fork the repository and create your branch from `master`.** 
-If it's not an urgent bug fix, you should branch from `master` and work on the feature or fix in there.
-
-2. **Conduct your changes.**
-Make your changes following best practices for coding in the project's language. 
+1. **Fork the repository and create your branch from `master`.**  
+   If it’s not an urgent bug fix, branch from `master` and work on the feature or fix there.
 
-3. **Test your changes.**
-Make sure your changes pass all the tests if there are any. If the project doesn't have automated testing infrastructure, test your changes manually to confirm they behave as expected.
+2. **Make your changes.**  
+   Implement your changes following best practices for coding in the project's language.
 
-4. **Follow the coding style.**
-Ensure your code adheres to the coding conventions used throughout the project, that includes indentation, accurate comments, etc.
+3. **Test your changes.**  
+   Ensure that your changes pass all tests if any exist. If the project doesn’t have automated tests, test your changes manually to confirm they behave as expected.
 
-5. **Commit your changes.**
-Make your git commits informative and concise. This is very helpful for others when they look at the git log.
+4. **Follow the coding style.**  
+   Ensure your code adheres to the coding conventions used throughout the project, including indentation, accurate comments, etc.
 
-6. **Push to your fork and submit a pull request.**
-When your work is ready and passes tests, push your branch to your fork of the repository and submit a pull request from there.
+5. **Commit your changes.**  
+   Make your Git commits informative and concise. This is very helpful for others when they look at the Git log.
 
-7. **Pat your back and wait for the review.**
-Your work is done, congratulations! Now sit tight. The project maintainers will review your submission as soon as possible. They might suggest changes or ask for improvements. Both constructive conversation and patience are key to the collaboration process.
+6. **Push to your fork and submit a pull request.**  
+   When your work is ready and passes tests, push your branch to your fork of the repository and submit a pull request from there.
 
+7. **Pat yourself on the back and wait for review.**  
+   Your work is done, congratulations! Now sit tight. The project maintainers will review your submission as soon as possible. They might suggest changes or ask for improvements. Both constructive conversation and patience are key to the collaboration process.
 
 ## Documentation
 
-If you would like to contribute to the project's documentation, please follow the same steps: fork the repository, make your changes, test them, and submit a pull request. 
+If you would like to contribute to the project's documentation, please follow the same steps: fork the repository, make your changes, test them, and submit a pull request.
 
-Documentation is a vital part of any software. It's not just about having good code. Ensuring that the users and contributors understand what's going on, how to use the software or how to contribute, is crucial.
+Documentation is a vital part of any software. It's not just about having good code; ensuring that users and contributors understand what's going on, how to use the software, or how to contribute is crucial.
 
-We're grateful for all our contributors, and we look forward to building the world's leading AI research agent hand-in-hand with you. Let's harness the power of Open Source and AI to change the world together!
+We're grateful for all our contributors, and we look forward to building the world's leading AI research agent hand-in-hand with you. Let's harness the power of open source and AI to change the world together!
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-<div align="center">
+<div align="center" id="top">
 <!--<h1 style="display: flex; align-items: center; gap: 10px;">
   <img src="https://github.com/assafelovic/gpt-researcher/assets/13554167/a45bac7c-092c-42e5-8eb6-69acbf20dde5" alt="Logo" width="25">
   GPT Researcher
@@ -67,6 +67,7 @@ More specifically:
 
 ## Features
 - 📝 Generate research, outlines, resources and lessons reports with local documents and web sources
+- 🖼️ Supports smart article image scraping and filtering
 - 📜 Can generate long and detailed research reports (over 2K words)
 - 🌐 Aggregates over 20 web sources per research to form objective and factual conclusions
 - 🖥️ Includes both lightweight (HTML/CSS/JS) and production ready (NextJS + Tailwind) UX/UI
@@ -245,3 +246,8 @@ Our view on unbiased research claims:
   </picture>
 </a>
 </p>
+
+
+<p align="right">
+  <a href="#top">⬆️ Back to Top</a>
+</p>
diff --git a/docs/docs/gpt-researcher/gptr/config.md b/docs/docs/gpt-researcher/gptr/config.md
@@ -19,9 +19,10 @@ You can also include your own external JSON file `config.json` by adding the pat
 Below is a list of current supported options:
 
 - **`RETRIEVER`**: Web search engine used for retrieving sources. Defaults to `tavily`. Options: `duckduckgo`, `bing`, `google`, `searchapi`, `serper`, `searx`. [Check here](https://github.com/assafelovic/gpt-researcher/tree/master/gpt_researcher/retrievers) for supported retrievers
-- **`EMBEDDING_PROVIDER`**: Provider for embedding model. Defaults to `openai`. Options: `ollama`, `huggingface`, `azure_openai`, `custom`.
+- **`EMBEDDING`**: Embedding model. Defaults to `openai:text-embedding-3-small`. Options: `ollama`, `huggingface`, `azure_openai`, `custom`.
 - **`FAST_LLM`**: Model name for fast LLM operations such summaries. Defaults to `openai:gpt-4o-mini`.
 - **`SMART_LLM`**: Model name for smart operations like generating research reports and reasoning. Defaults to `openai:gpt-4o`.
+- **`STRATEGIC_LLM`**: Model name for strategic operations like generating research plans and strategies. Defaults to `openai:o1-preview`.
 - **`FAST_TOKEN_LIMIT`**: Maximum token limit for fast LLM responses. Defaults to `2000`.
 - **`SMART_TOKEN_LIMIT`**: Maximum token limit for smart LLM responses. Defaults to `4000`.
 - **`BROWSE_CHUNK_MAX_LENGTH`**: Maximum length of text chunks to browse in web sources. Defaults to `8192`.
@@ -60,10 +61,9 @@ Here is an example for [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai
 
 OPENAI_API_VERSION="2024-05-01-preview" # or whatever you are using
 AZURE_OPENAI_ENDPOINT="https://CHANGEMEN.openai.azure.com/" # change to the name of your deployment
-AZURE_OPENAI_API_KEY="CHANGEME" # change to your API key
+AZURE_OPENAI_API_KEY="[Your Key]" # change to your API key
 
-EMBEDDING_PROVIDER="azureopenai"
-AZURE_EMBEDDING_MODEL="text-embedding-ada-002" # change to the deployment of your embedding model
+EMBEDDING="azure_openai:text-embedding-ada-002" # change to the deployment of your embedding model
 
 FAST_LLM="azure_openai:gpt-4o-mini" # change to the name of your deployment (not model-name)
 FAST_TOKEN_LIMIT=4000
@@ -72,6 +72,6 @@ SMART_LLM="azure_openai:gpt-4o" # change to the name of your deployment (not mod
 SMART_TOKEN_LIMIT=4000
 
 RETRIEVER="bing" # if you are using Bing as your search engine (which is likely if you use Azure)
-BING_API_KEY="CHANGEME"
+BING_API_KEY="[Your Key]"
 
 ```
diff --git a/docs/docs/gpt-researcher/gptr/handling-logs-as-they-stream.md b/docs/docs/gpt-researcher/gptr/handling-logs-as-they-stream.md
@@ -0,0 +1,64 @@
+# Handling Logs
+
+Here is a snippet of code to help you handle the streaming logs of your Research tasks.
+
+```python
+from typing import Dict, Any
+import asyncio
+from gpt_researcher import GPTResearcher
+
+class CustomLogsHandler:
+    """A custom Logs handler class to handle JSON data."""
+    def __init__(self):
+        self.logs = []  # Initialize logs to store data
+
+    async def send_json(self, data: Dict[str, Any]) -> None:
+        """Send JSON data and log it."""
+        self.logs.append(data)  # Append data to logs
+        print(f"My custom Log: {data}")  # For demonstration, print the log
+
+async def run():
+    # Define the necessary parameters with sample values
+
+    query = "What happened in the latest burning man floods?"
+    report_type = "research_report"  # Type of report to generate
+    report_source = "online"  # Could specify source like 'online', 'books', etc.
+    tone = "informative"  # Tone of the report ('informative', 'casual', etc.)
+    config_path = None  # Path to a config file, if needed
+
+    # Initialize researcher with a custom WebSocket
+    custom_logs_handler = CustomLogsHandler()
+
+    researcher = GPTResearcher(
+        query=query,
+        report_type=report_type,
+        report_source=report_source,
+        tone=tone,
+        config_path=config_path,
+        websocket=custom_logs_handler
+    )
+
+    await researcher.conduct_research()  # Conduct the research
+    report = await researcher.write_report()  # Write the research report
+
+    return report
+
+# Run the asynchronous function using asyncio
+if __name__ == "__main__":
+    asyncio.run(run())
+```
+
+The data from the research process will be logged and stored in the `CustomLogsHandler` instance. You can customize the logging behavior as needed for your application.
+
+Here's a sample of the output:
+
+```
+{
+    "type": "logs",
+    "content": "added_source_url",
+    "output": "✅ Added source url to research: https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery\n",
+    "metadata": "https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery"
+}
+```
+
+The `metadata` field will include whatever metadata is relevant to the log entry. Let the script above run to completion for the full logs output of a given research task.
diff --git a/docs/docs/gpt-researcher/llms/llms.md b/docs/docs/gpt-researcher/llms/llms.md
@@ -30,16 +30,13 @@ SMART_LLM="openai:gpt-4o"
 ```
 ### Custom OpenAI API Embedding
 ```bash
-# use a custom OpenAI API EMBEDDING provider
-EMBEDDING_PROVIDER="custom"
-
 # set the custom OpenAI API url
 OPENAI_BASE_URL="http://localhost:1234/v1"
 # set the custom OpenAI API key
 OPENAI_API_KEY="Your Key"
 
 # specify the custom OpenAI API embedding model   
-OPENAI_EMBEDDING_MODEL="custom_model"
+EMBEDDING="custom:custom_model"
 ```
 
 ### Azure OpenAI
@@ -49,16 +46,12 @@ See also the documentation in the Langchain [Azure OpenAI](https://api.python.la
 On Azure OpenAI you will need to create deployments for each model you want to use. Please also specify the model names/deployment names in your `.env` file:
 
 ```bash
-EMBEDDING_PROVIDER=azure_openai
+EMBEDDING="azure_openai:text-embedding-ada-002"
 AZURE_OPENAI_API_KEY=[Your Key]
 AZURE_OPENAI_ENDPOINT=https://<your-endpoint>.openai.azure.com/
 OPENAI_API_VERSION=2024-05-01-preview
 FAST_LLM=azure_openai:gpt-4o-mini # note that the deployment name must be the same as the model name
 SMART_LLM=azure_openai:gpt-4o # note that the deployment name must be the same as the model name
-
-AZURE_EMBEDDING_MODEL=text-embedding-3-small # must be in the same region/resource as the models used
-
-
 ```
 
 
@@ -68,23 +61,10 @@ GPT Researcher supports both Ollama LLMs and embeddings. You can choose each or
 To use [Ollama](http://www.ollama.com) you can set the following environment variables
 
 ```bash
-# Use ollama for both, LLM and EMBEDDING provider
-# Ollama endpoint to use
 OLLAMA_BASE_URL=http://localhost:11434
-
-# Specify one of the LLM models supported by Ollama
-FAST_LLM=ollama:llama3
-# Specify one of the LLM models supported by Ollama 
-SMART_LLM=ollama:llama3 
-# The temperature to use, defaults to 0.55
-TEMPERATURE=0.55
-```
-
-**Optional** - You can also use ollama for embeddings
-```bash
-EMBEDDING_PROVIDER=ollama
-# Specify one of the embedding models supported by Ollama 
-OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+EMBEDDING="ollama:nomic-embed-text"
+FAST_LLM="ollama:llama3"
+SMART_LLM="ollama:llama3" 
 ```
 
 ## Groq
@@ -107,9 +87,9 @@ And finally, you will need to configure the GPT-Researcher Provider and Model va
 GROQ_API_KEY=[Your Key]
 
 # Set one of the LLM models supported by Groq
-FAST_LLM=groq:Mixtral-8x7b-32768
+FAST_LLM="groq:Mixtral-8x7b-32768"
 # Set one of the LLM models supported by Groq
-SMART_LLM=groq:Mixtral-8x7b-32768 
+SMART_LLM="groq:Mixtral-8x7b-32768" 
 # The temperature to use defaults to 0.55
 TEMPERATURE=0.55
 ```
@@ -125,71 +105,72 @@ __NOTE:__ As of the writing of this Doc (May 2024), the available Language Model
 [Anthropic](https://www.anthropic.com/) is an AI safety and research company, and is the creator of Claude. This page covers all integrations between Anthropic models and LangChain.
 ```bash
 ANTHROPIC_API_KEY=[Your key]
-FAST_LLM=anthropic:claude-2.1
-SMART_LLM=anthropic:claude-3-opus-20240229
+FAST_LLM="anthropic:claude-2.1"
+SMART_LLM="anthropic:claude-3-opus-20240229"
 ```
 
 ## Mistral AI
 Sign up for a [Mistral API key](https://console.mistral.ai/users/api-keys/). 
 Then update the corresponding env vars, for example:
 ```bash
-ANTHROPIC_API_KEY=[Your key]
-FAST_LLM=mistralai:open-mistral-7b
-SMART_LLM=mistralai:mistral-large-latest
+MISTRAL_API_KEY=[Your key]
+FAST_LLM="mistralai:open-mistral-7b"
+SMART_LLM="mistralai:mistral-large-latest"
 ```
 
 ## Together AI
 [Together AI](https://www.together.ai/) offers an API to query [50+ leading open-source models](https://docs.together.ai/docs/inference-models) in a couple lines of code.
 Then update corresponding env vars, for example:
 ```bash
 TOGETHER_API_KEY=[Your key]
-FAST_LLM=together:meta-llama/Llama-3-8b-chat-hf
-SMART_LLM=together:meta-llama/Llama-3-70b-chat-hf
+FAST_LLM="together:meta-llama/Llama-3-8b-chat-hf"
+SMART_LLM="together:meta-llama/Llama-3-70b-chat-hf"
 ```
 
 ## HuggingFace
 This integration requires a bit of extra work. Follow [this guide](https://python.langchain.com/v0.1/docs/integrations/chat/huggingface/) to learn more.
 After you've followed the tutorial above, update the env vars:
 ```bash
 HUGGINGFACE_API_KEY=[Your key]
-FAST_LLM=huggingface:HuggingFaceH4/zephyr-7b-beta
-SMART_LLM=huggingface:HuggingFaceH4/zephyr-7b-beta
+EMBEDDING="sentence-transformers/all-MiniLM-L6-v2"
+FAST_LLM="huggingface:HuggingFaceH4/zephyr-7b-beta"
+SMART_LLM="huggingface:HuggingFaceH4/zephyr-7b-beta"
 ```
 
 ## Google Gemini
 Sign up [here](https://ai.google.dev/gemini-api/docs/api-key) for obtaining a Google Gemini API Key and update the following env vars:
 ```bash
 GOOGLE_API_KEY=[Your key]
-FAST_LLM=google_genai:gemini-1.5-flash
-SMART_LLM=google_genai:gemini-1.5-pro
+FAST_LLM="google_genai:gemini-1.5-flash"
+SMART_LLM="google_genai:gemini-1.5-pro"
 ```
 
 ## Google VertexAI
 
 ```bash
-FAST_LLM=google_vertexai:gemini-1.5-flash-001
-SMART_LLM=google_vertexai:gemini-1.5-pro-001
+FAST_LLM="google_vertexai:gemini-1.5-flash-001"
+SMART_LLM="google_vertexai:gemini-1.5-pro-001"
 ```
 
 ## Cohere
 
 ```bash
 COHERE_API_KEY=[Your key]
-FAST_LLM=cohere:command
-SMART_LLM=cohere:command-nightly
+FAST_LLM="cohere:command"
+SMART_LLM="cohere:command-nightly"
 ```
 
 ## Fireworks
 
 ```bash
 FIREWORKS_API_KEY=[Your key]
-FAST_LLM=fireworks:accounts/fireworks/models/mixtral-8x7b-instruct
-SMART_LLM=fireworks:accounts/fireworks/models/mixtral-8x7b-instruct
+FAST_LLM="fireworks:accounts/fireworks/models/mixtral-8x7b-instruct"
+SMART_LLM="fireworks:accounts/fireworks/models/mixtral-8x7b-instruct"
 ```
 
 ## Bedrock
 
 ```bash
-FAST_LLM=bedrock:anthropic.claude-3-sonnet-20240229-v1:0
-SMART_LLM=bedrock:anthropic.claude-3-sonnet-20240229-v1:0
+FAST_LLM="bedrock:anthropic.claude-3-sonnet-20240229-v1:0"
+SMART_LLM="bedrock:anthropic.claude-3-sonnet-20240229-v1:0"
 ```
diff --git a/docs/docs/gpt-researcher/llms/running-with-ollama.md b/docs/docs/gpt-researcher/llms/running-with-ollama.md
@@ -28,10 +28,9 @@ If you deploy ollama locally, a .env like so, should enable powering GPT-Researc
 OPENAI_API_KEY="123"
 OPENAI_API_BASE="http://127.0.0.1:11434/v1"
 OLLAMA_BASE_URL="http://127.0.0.1:11434/"
-FAST_LLM=ollama:qwen2:1.5b
-SMART_LLM=ollama:qwen2:1.5b
-OLLAMA_EMBEDDING_MODEL=all-minilm:22m
-EMBEDDING_PROVIDER=ollama
+FAST_LLM="ollama:qwen2:1.5b"
+SMART_LLM="ollama:qwen2:1.5b"
+EMBEDDING="ollama:all-minilm:22m"
 ```
 
 Replace `FAST_LLM` & `SMART_LLM` with the model you downloaded from the Elestio Web UI in the previous step.
@@ -95,10 +94,9 @@ Here's an example .env file that will enable powering GPT-Researcher with Elesti
 OPENAI_API_KEY="123"
 OPENAI_API_BASE="https://<your_custom_elestio_project>.vm.elestio.app:57987/v1"
 OLLAMA_BASE_URL="https://<your_custom_elestio_project>.vm.elestio.app:57987/"
-FAST_LLM=openai:qwen2:1.5b
-SMART_LLM=openai:qwen2:1.5b
-OLLAMA_EMBEDDING_MODEL=all-minilm:22m
-EMBEDDING_PROVIDER=ollama
+FAST_LLM="openai:qwen2:1.5b"
+SMART_LLM="openai:qwen2:1.5b"
+EMBEDDING="ollama:all-minilm:22m"
 ```
 
 #### Disable Elestio Authentication or Add Auth Headers