Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…cher-fork into edit-docs

# Conflicts:
#	docs/docs/gpt-researcher/llms/llms.md
  • Loading branch information
scchengaiah committed Oct 27, 2024
2 parents 99af2cf + becaa70 commit 2d856ee
Show file tree
Hide file tree
Showing 46 changed files with 900 additions and 364 deletions.
42 changes: 21 additions & 21 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
# Contributing to GPT Researcher
First off, we'd like to welcome you, and thank you for your interest and effort in contributing to our open-source project ❤️. Contributions of all forms are welcome, from new features and bug fixes, to documentation and more.

We are on a mission to build the #1 AI agent for comprehensive, unbiased, and factual research online. And we need your support to achieve this grand vision.
First off, we'd like to welcome you and thank you for your interest and effort in contributing to our open-source project ❤️. Contributions of all forms are welcome—from new features and bug fixes to documentation and more.

Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved.
We are on a mission to build the #1 AI agent for comprehensive, unbiased, and factual research online, and we need your support to achieve this grand vision.

Please take a moment to review this document to make the contribution process easy and effective for everyone involved.

## Reporting Issues

If you come across any issue or have an idea for an improvement, don't hesitate to create an issue on GitHub. Describe your problem in sufficient detail, providing as much relevant information as possible. This way, we can reproduce the issue before attempting to fix it or respond appropriately.

## Contributing Code

1. **Fork the repository and create your branch from `master`.**
If it's not an urgent bug fix, you should branch from `master` and work on the feature or fix in there.

2. **Conduct your changes.**
Make your changes following best practices for coding in the project's language.
1. **Fork the repository and create your branch from `master`.**
If it’s not an urgent bug fix, branch from `master` and work on the feature or fix there.

3. **Test your changes.**
Make sure your changes pass all the tests if there are any. If the project doesn't have automated testing infrastructure, test your changes manually to confirm they behave as expected.
2. **Make your changes.**
Implement your changes following best practices for coding in the project's language.

4. **Follow the coding style.**
Ensure your code adheres to the coding conventions used throughout the project, that includes indentation, accurate comments, etc.
3. **Test your changes.**
Ensure that your changes pass all tests if any exist. If the project doesn’t have automated tests, test your changes manually to confirm they behave as expected.

5. **Commit your changes.**
Make your git commits informative and concise. This is very helpful for others when they look at the git log.
4. **Follow the coding style.**
Ensure your code adheres to the coding conventions used throughout the project, including indentation, accurate comments, etc.

6. **Push to your fork and submit a pull request.**
When your work is ready and passes tests, push your branch to your fork of the repository and submit a pull request from there.
5. **Commit your changes.**
Make your Git commits informative and concise. This is very helpful for others when they look at the Git log.

7. **Pat your back and wait for the review.**
Your work is done, congratulations! Now sit tight. The project maintainers will review your submission as soon as possible. They might suggest changes or ask for improvements. Both constructive conversation and patience are key to the collaboration process.
6. **Push to your fork and submit a pull request.**
When your work is ready and passes tests, push your branch to your fork of the repository and submit a pull request from there.

7. **Pat yourself on the back and wait for review.**
Your work is done, congratulations! Now sit tight. The project maintainers will review your submission as soon as possible. They might suggest changes or ask for improvements. Both constructive conversation and patience are key to the collaboration process.

## Documentation

If you would like to contribute to the project's documentation, please follow the same steps: fork the repository, make your changes, test them, and submit a pull request.
If you would like to contribute to the project's documentation, please follow the same steps: fork the repository, make your changes, test them, and submit a pull request.

Documentation is a vital part of any software. It's not just about having good code. Ensuring that the users and contributors understand what's going on, how to use the software or how to contribute, is crucial.
Documentation is a vital part of any software. It's not just about having good code; ensuring that users and contributors understand what's going on, how to use the software, or how to contribute is crucial.

We're grateful for all our contributors, and we look forward to building the world's leading AI research agent hand-in-hand with you. Let's harness the power of Open Source and AI to change the world together!
We're grateful for all our contributors, and we look forward to building the world's leading AI research agent hand-in-hand with you. Let's harness the power of open source and AI to change the world together!
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<div align="center">
<div align="center" id="top">
<!--<h1 style="display: flex; align-items: center; gap: 10px;">
<img src="https://github.com/assafelovic/gpt-researcher/assets/13554167/a45bac7c-092c-42e5-8eb6-69acbf20dde5" alt="Logo" width="25">
GPT Researcher
Expand Down Expand Up @@ -67,6 +67,7 @@ More specifically:

## Features
- 📝 Generate research, outlines, resources and lessons reports with local documents and web sources
- 🖼️ Supports smart article image scraping and filtering
- 📜 Can generate long and detailed research reports (over 2K words)
- 🌐 Aggregates over 20 web sources per research to form objective and factual conclusions
- 🖥️ Includes both lightweight (HTML/CSS/JS) and production ready (NextJS + Tailwind) UX/UI
Expand Down Expand Up @@ -245,3 +246,8 @@ Our view on unbiased research claims:
</picture>
</a>
</p>


<p align="right">
<a href="#top">⬆️ Back to Top</a>
</p>
10 changes: 5 additions & 5 deletions docs/docs/gpt-researcher/gptr/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ You can also include your own external JSON file `config.json` by adding the pat
Below is a list of current supported options:

- **`RETRIEVER`**: Web search engine used for retrieving sources. Defaults to `tavily`. Options: `duckduckgo`, `bing`, `google`, `searchapi`, `serper`, `searx`. [Check here](https://github.com/assafelovic/gpt-researcher/tree/master/gpt_researcher/retrievers) for supported retrievers
- **`EMBEDDING_PROVIDER`**: Provider for embedding model. Defaults to `openai`. Options: `ollama`, `huggingface`, `azure_openai`, `custom`.
- **`EMBEDDING`**: Embedding model. Defaults to `openai:text-embedding-3-small`. Options: `ollama`, `huggingface`, `azure_openai`, `custom`.
- **`FAST_LLM`**: Model name for fast LLM operations such summaries. Defaults to `openai:gpt-4o-mini`.
- **`SMART_LLM`**: Model name for smart operations like generating research reports and reasoning. Defaults to `openai:gpt-4o`.
- **`STRATEGIC_LLM`**: Model name for strategic operations like generating research plans and strategies. Defaults to `openai:o1-preview`.
- **`FAST_TOKEN_LIMIT`**: Maximum token limit for fast LLM responses. Defaults to `2000`.
- **`SMART_TOKEN_LIMIT`**: Maximum token limit for smart LLM responses. Defaults to `4000`.
- **`BROWSE_CHUNK_MAX_LENGTH`**: Maximum length of text chunks to browse in web sources. Defaults to `8192`.
Expand Down Expand Up @@ -60,10 +61,9 @@ Here is an example for [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai

OPENAI_API_VERSION="2024-05-01-preview" # or whatever you are using
AZURE_OPENAI_ENDPOINT="https://CHANGEMEN.openai.azure.com/" # change to the name of your deployment
AZURE_OPENAI_API_KEY="CHANGEME" # change to your API key
AZURE_OPENAI_API_KEY="[Your Key]" # change to your API key

EMBEDDING_PROVIDER="azureopenai"
AZURE_EMBEDDING_MODEL="text-embedding-ada-002" # change to the deployment of your embedding model
EMBEDDING="azure_openai:text-embedding-ada-002" # change to the deployment of your embedding model

FAST_LLM="azure_openai:gpt-4o-mini" # change to the name of your deployment (not model-name)
FAST_TOKEN_LIMIT=4000
Expand All @@ -72,6 +72,6 @@ SMART_LLM="azure_openai:gpt-4o" # change to the name of your deployment (not mod
SMART_TOKEN_LIMIT=4000

RETRIEVER="bing" # if you are using Bing as your search engine (which is likely if you use Azure)
BING_API_KEY="CHANGEME"
BING_API_KEY="[Your Key]"

```
64 changes: 64 additions & 0 deletions docs/docs/gpt-researcher/gptr/handling-logs-as-they-stream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Handling Logs

Here is a snippet of code to help you handle the streaming logs of your Research tasks.

```python
from typing import Dict, Any
import asyncio
from gpt_researcher import GPTResearcher

class CustomLogsHandler:
"""A custom Logs handler class to handle JSON data."""
def __init__(self):
self.logs = [] # Initialize logs to store data

async def send_json(self, data: Dict[str, Any]) -> None:
"""Send JSON data and log it."""
self.logs.append(data) # Append data to logs
print(f"My custom Log: {data}") # For demonstration, print the log

async def run():
# Define the necessary parameters with sample values

query = "What happened in the latest burning man floods?"
report_type = "research_report" # Type of report to generate
report_source = "online" # Could specify source like 'online', 'books', etc.
tone = "informative" # Tone of the report ('informative', 'casual', etc.)
config_path = None # Path to a config file, if needed

# Initialize researcher with a custom WebSocket
custom_logs_handler = CustomLogsHandler()

researcher = GPTResearcher(
query=query,
report_type=report_type,
report_source=report_source,
tone=tone,
config_path=config_path,
websocket=custom_logs_handler
)

await researcher.conduct_research() # Conduct the research
report = await researcher.write_report() # Write the research report

return report

# Run the asynchronous function using asyncio
if __name__ == "__main__":
asyncio.run(run())
```

The data from the research process will be logged and stored in the `CustomLogsHandler` instance. You can customize the logging behavior as needed for your application.

Here's a sample of the output:

```
{
"type": "logs",
"content": "added_source_url",
"output": "✅ Added source url to research: https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery\n",
"metadata": "https://www.npr.org/2023/09/28/1202110410/how-rumors-and-conspiracy-theories-got-in-the-way-of-mauis-fire-recovery"
}
```

The `metadata` field will include whatever metadata is relevant to the log entry. Let the script above run to completion for the full logs output of a given research task.
73 changes: 27 additions & 46 deletions docs/docs/gpt-researcher/llms/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,13 @@ SMART_LLM="openai:gpt-4o"
```
### Custom OpenAI API Embedding
```bash
# use a custom OpenAI API EMBEDDING provider
EMBEDDING_PROVIDER="custom"

# set the custom OpenAI API url
OPENAI_BASE_URL="http://localhost:1234/v1"
# set the custom OpenAI API key
OPENAI_API_KEY="Your Key"

# specify the custom OpenAI API embedding model
OPENAI_EMBEDDING_MODEL="custom_model"
EMBEDDING="custom:custom_model"
```

### Azure OpenAI
Expand All @@ -49,16 +46,12 @@ See also the documentation in the Langchain [Azure OpenAI](https://api.python.la
On Azure OpenAI you will need to create deployments for each model you want to use. Please also specify the model names/deployment names in your `.env` file:

```bash
EMBEDDING_PROVIDER=azure_openai
EMBEDDING="azure_openai:text-embedding-ada-002"
AZURE_OPENAI_API_KEY=[Your Key]
AZURE_OPENAI_ENDPOINT=https://<your-endpoint>.openai.azure.com/
OPENAI_API_VERSION=2024-05-01-preview
FAST_LLM=azure_openai:gpt-4o-mini # note that the deployment name must be the same as the model name
SMART_LLM=azure_openai:gpt-4o # note that the deployment name must be the same as the model name

AZURE_EMBEDDING_MODEL=text-embedding-3-small # must be in the same region/resource as the models used


```


Expand All @@ -68,23 +61,10 @@ GPT Researcher supports both Ollama LLMs and embeddings. You can choose each or
To use [Ollama](http://www.ollama.com) you can set the following environment variables

```bash
# Use ollama for both, LLM and EMBEDDING provider
# Ollama endpoint to use
OLLAMA_BASE_URL=http://localhost:11434

# Specify one of the LLM models supported by Ollama
FAST_LLM=ollama:llama3
# Specify one of the LLM models supported by Ollama
SMART_LLM=ollama:llama3
# The temperature to use, defaults to 0.55
TEMPERATURE=0.55
```

**Optional** - You can also use ollama for embeddings
```bash
EMBEDDING_PROVIDER=ollama
# Specify one of the embedding models supported by Ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
EMBEDDING="ollama:nomic-embed-text"
FAST_LLM="ollama:llama3"
SMART_LLM="ollama:llama3"
```

## Groq
Expand All @@ -107,9 +87,9 @@ And finally, you will need to configure the GPT-Researcher Provider and Model va
GROQ_API_KEY=[Your Key]

# Set one of the LLM models supported by Groq
FAST_LLM=groq:Mixtral-8x7b-32768
FAST_LLM="groq:Mixtral-8x7b-32768"
# Set one of the LLM models supported by Groq
SMART_LLM=groq:Mixtral-8x7b-32768
SMART_LLM="groq:Mixtral-8x7b-32768"
# The temperature to use defaults to 0.55
TEMPERATURE=0.55
```
Expand All @@ -125,71 +105,72 @@ __NOTE:__ As of the writing of this Doc (May 2024), the available Language Model
[Anthropic](https://www.anthropic.com/) is an AI safety and research company, and is the creator of Claude. This page covers all integrations between Anthropic models and LangChain.
```bash
ANTHROPIC_API_KEY=[Your key]
FAST_LLM=anthropic:claude-2.1
SMART_LLM=anthropic:claude-3-opus-20240229
FAST_LLM="anthropic:claude-2.1"
SMART_LLM="anthropic:claude-3-opus-20240229"
```

## Mistral AI
Sign up for a [Mistral API key](https://console.mistral.ai/users/api-keys/).
Then update the corresponding env vars, for example:
```bash
ANTHROPIC_API_KEY=[Your key]
FAST_LLM=mistralai:open-mistral-7b
SMART_LLM=mistralai:mistral-large-latest
MISTRAL_API_KEY=[Your key]
FAST_LLM="mistralai:open-mistral-7b"
SMART_LLM="mistralai:mistral-large-latest"
```

## Together AI
[Together AI](https://www.together.ai/) offers an API to query [50+ leading open-source models](https://docs.together.ai/docs/inference-models) in a couple lines of code.
Then update corresponding env vars, for example:
```bash
TOGETHER_API_KEY=[Your key]
FAST_LLM=together:meta-llama/Llama-3-8b-chat-hf
SMART_LLM=together:meta-llama/Llama-3-70b-chat-hf
FAST_LLM="together:meta-llama/Llama-3-8b-chat-hf"
SMART_LLM="together:meta-llama/Llama-3-70b-chat-hf"
```

## HuggingFace
This integration requires a bit of extra work. Follow [this guide](https://python.langchain.com/v0.1/docs/integrations/chat/huggingface/) to learn more.
After you've followed the tutorial above, update the env vars:
```bash
HUGGINGFACE_API_KEY=[Your key]
FAST_LLM=huggingface:HuggingFaceH4/zephyr-7b-beta
SMART_LLM=huggingface:HuggingFaceH4/zephyr-7b-beta
EMBEDDING="sentence-transformers/all-MiniLM-L6-v2"
FAST_LLM="huggingface:HuggingFaceH4/zephyr-7b-beta"
SMART_LLM="huggingface:HuggingFaceH4/zephyr-7b-beta"
```

## Google Gemini
Sign up [here](https://ai.google.dev/gemini-api/docs/api-key) for obtaining a Google Gemini API Key and update the following env vars:
```bash
GOOGLE_API_KEY=[Your key]
FAST_LLM=google_genai:gemini-1.5-flash
SMART_LLM=google_genai:gemini-1.5-pro
FAST_LLM="google_genai:gemini-1.5-flash"
SMART_LLM="google_genai:gemini-1.5-pro"
```

## Google VertexAI

```bash
FAST_LLM=google_vertexai:gemini-1.5-flash-001
SMART_LLM=google_vertexai:gemini-1.5-pro-001
FAST_LLM="google_vertexai:gemini-1.5-flash-001"
SMART_LLM="google_vertexai:gemini-1.5-pro-001"
```

## Cohere

```bash
COHERE_API_KEY=[Your key]
FAST_LLM=cohere:command
SMART_LLM=cohere:command-nightly
FAST_LLM="cohere:command"
SMART_LLM="cohere:command-nightly"
```

## Fireworks

```bash
FIREWORKS_API_KEY=[Your key]
FAST_LLM=fireworks:accounts/fireworks/models/mixtral-8x7b-instruct
SMART_LLM=fireworks:accounts/fireworks/models/mixtral-8x7b-instruct
FAST_LLM="fireworks:accounts/fireworks/models/mixtral-8x7b-instruct"
SMART_LLM="fireworks:accounts/fireworks/models/mixtral-8x7b-instruct"
```

## Bedrock

```bash
FAST_LLM=bedrock:anthropic.claude-3-sonnet-20240229-v1:0
SMART_LLM=bedrock:anthropic.claude-3-sonnet-20240229-v1:0
FAST_LLM="bedrock:anthropic.claude-3-sonnet-20240229-v1:0"
SMART_LLM="bedrock:anthropic.claude-3-sonnet-20240229-v1:0"
```
14 changes: 6 additions & 8 deletions docs/docs/gpt-researcher/llms/running-with-ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,9 @@ If you deploy ollama locally, a .env like so, should enable powering GPT-Researc
OPENAI_API_KEY="123"
OPENAI_API_BASE="http://127.0.0.1:11434/v1"
OLLAMA_BASE_URL="http://127.0.0.1:11434/"
FAST_LLM=ollama:qwen2:1.5b
SMART_LLM=ollama:qwen2:1.5b
OLLAMA_EMBEDDING_MODEL=all-minilm:22m
EMBEDDING_PROVIDER=ollama
FAST_LLM="ollama:qwen2:1.5b"
SMART_LLM="ollama:qwen2:1.5b"
EMBEDDING="ollama:all-minilm:22m"
```

Replace `FAST_LLM` & `SMART_LLM` with the model you downloaded from the Elestio Web UI in the previous step.
Expand Down Expand Up @@ -95,10 +94,9 @@ Here's an example .env file that will enable powering GPT-Researcher with Elesti
OPENAI_API_KEY="123"
OPENAI_API_BASE="https://<your_custom_elestio_project>.vm.elestio.app:57987/v1"
OLLAMA_BASE_URL="https://<your_custom_elestio_project>.vm.elestio.app:57987/"
FAST_LLM=openai:qwen2:1.5b
SMART_LLM=openai:qwen2:1.5b
OLLAMA_EMBEDDING_MODEL=all-minilm:22m
EMBEDDING_PROVIDER=ollama
FAST_LLM="openai:qwen2:1.5b"
SMART_LLM="openai:qwen2:1.5b"
EMBEDDING="ollama:all-minilm:22m"
```

#### Disable Elestio Authentication or Add Auth Headers
Expand Down
Loading

0 comments on commit 2d856ee

Please sign in to comment.