Skip to content

Commit 14d2493

Browse files
authored
Updated Ollama part of local deployment (infiniflow#1066)
### What problem does this PR solve? infiniflow#720 ### Type of change - [x] Documentation Update
1 parent 20e69e0 commit 14d2493

File tree

6 files changed

+138
-55
lines changed

6 files changed

+138
-55
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
2020
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a>
2121
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
22-
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
22+
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
2323
</a>
2424
</p>
2525

@@ -316,7 +316,7 @@ To launch the service from source:
316316
317317
- [Quickstart](https://ragflow.io/docs/dev/)
318318
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
319-
- [Reference](https://ragflow.io/docs/dev/category/references)
319+
- [References](https://ragflow.io/docs/dev/category/references)
320320
- [FAQ](https://ragflow.io/docs/dev/faq)
321321
322322
## 📜 Roadmap

README_ja.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen"
2121
alt="docker pull infiniflow/ragflow:v0.7.0"></a>
2222
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
23-
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
23+
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
2424
</a>
2525
</p>
2626

@@ -263,7 +263,7 @@ $ bash ./entrypoint.sh
263263

264264
- [Quickstart](https://ragflow.io/docs/dev/)
265265
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
266-
- [Reference](https://ragflow.io/docs/dev/category/references)
266+
- [References](https://ragflow.io/docs/dev/category/references)
267267
- [FAQ](https://ragflow.io/docs/dev/faq)
268268

269269
## 📜 ロードマップ

README_zh.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
2020
<img src="https://img.shields.io/badge/docker_pull-ragflow:v0.7.0-brightgreen" alt="docker pull infiniflow/ragflow:v0.7.0"></a>
2121
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
22-
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=2e6cc4" alt="license">
22+
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
2323
</a>
2424
</p>
2525

@@ -283,7 +283,7 @@ $ systemctl start nginx
283283

284284
- [Quickstart](https://ragflow.io/docs/dev/)
285285
- [User guide](https://ragflow.io/docs/dev/category/user-guides)
286-
- [Reference](https://ragflow.io/docs/dev/category/references)
286+
- [References](https://ragflow.io/docs/dev/category/references)
287287
- [FAQ](https://ragflow.io/docs/dev/faq)
288288

289289
## 📜 路线图

docs/guides/deploy_local_llm.md

+116-36
Original file line numberDiff line numberDiff line change
@@ -5,71 +5,151 @@ slug: /deploy_local_llm
55

66
# Deploy a local LLM
77

8-
RAGFlow supports deploying LLMs locally using Ollama or Xinference.
8+
RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
99

10-
## Ollama
10+
RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
1111

12-
One-click deployment of local LLMs, that is [Ollama](https://github.com/ollama/ollama).
12+
:::tip NOTE
13+
This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
14+
:::
1315

14-
### Install
16+
## Deploy a local model using Ollama
1517

16-
- [Ollama on Linux](https://github.com/ollama/ollama/blob/main/docs/linux.md)
17-
- [Ollama Windows Preview](https://github.com/ollama/ollama/blob/main/docs/windows.md)
18-
- [Docker](https://hub.docker.com/r/ollama/ollama)
18+
[Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.
1919

20-
### Launch Ollama
20+
:::note
21+
- For information about downloading Ollama, see [here](https://github.com/ollama/ollama?tab=readme-ov-file#ollama).
22+
- For information about configuring Ollama server, see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server).
23+
- For a complete list of supported models and variants, see the [Ollama model library](https://ollama.com/library).
24+
:::
25+
26+
To deploy a local model, e.g., **Llama3**, using Ollama:
27+
28+
### 1. Check firewall settings
29+
30+
Ensure that your host machine's firewall allows inbound connections on port 11434. For example:
31+
32+
```bash
33+
sudo ufw allow 11434/tcp
34+
```
35+
### 2. Ensure Ollama is accessible
36+
37+
Restart system and use curl or your web browser to check if the service URL of your Ollama service at `http://localhost:11434` is accessible.
38+
39+
```bash
40+
Ollama is running
41+
```
42+
43+
### 3. Run your local model
44+
45+
```bash
46+
ollama run llama3
47+
```
48+
<details>
49+
<summary>If your Ollama is installed through Docker, run the following instead:</summary>
50+
51+
```bash
52+
docker exec -it ollama ollama run llama3
53+
```
54+
</details>
55+
56+
### 4. Add Ollama
57+
58+
In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Ollama to RAGFlow:
59+
60+
![add ollama](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)
61+
62+
63+
### 5. Complete basic Ollama settings
64+
65+
In the popup window, complete basic settings for Ollama:
66+
67+
1. Because **llama3** is a chat model, choose **chat** as the model type.
68+
2. Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama.
69+
3. Ensure that the base URL you enter is accessible to RAGFlow.
70+
4. OPTIONAL: Switch on the toggle under **Does it support Vision?** if your model includes an image-to-text model.
71+
72+
:::caution NOTE
73+
- If your Ollama and RAGFlow run on the same machine, use `http://localhost:11434` as base URL.
74+
- If your Ollama and RAGFlow run on the same machine and Ollama is in Docker, use `http://host.docker.internal:11434` as base URL.
75+
- If your Ollama runs on a different machine from RAGFlow, use `http://<IP_OF_OLLAMA_MACHINE>:11434` as base URL.
76+
:::
77+
78+
:::danger WARNING
79+
If your Ollama runs on a different machine, you may also need to set the `OLLAMA_HOST` environment variable to `0.0.0.0` in **ollama.service** (Note that this is *NOT* the base URL):
2180

22-
Decide which LLM you want to deploy ([here's a list for supported LLM](https://ollama.com/library)), say, **mistral**:
2381
```bash
24-
$ ollama run mistral
82+
Environment="OLLAMA_HOST=0.0.0.0"
2583
```
26-
Or,
84+
85+
See [this guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information.
86+
:::
87+
88+
:::caution WARNING
89+
Improper base URL settings will trigger the following error:
2790
```bash
28-
$ docker exec -it ollama ollama run mistral
91+
Max retries exceeded with url: /api/chat (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff98b81ff0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2992
```
93+
:::
3094

31-
### Use Ollama in RAGFlow
95+
### 6. Update System Model Settings
3296

33-
- Go to 'Settings > Model Providers > Models to be added > Ollama'.
34-
35-
![](https://github.com/infiniflow/ragflow/assets/12318111/a9df198a-226d-4f30-b8d7-829f00256d46)
97+
Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model:
98+
99+
*You should now be able to find **llama3** from the dropdown list under **Chat model**.*
36100

37-
> Base URL: Enter the base URL where the Ollama service is accessible, like, `http://<your-ollama-endpoint-domain>:11434`.
101+
> If your local model is an embedding model, you should find your local model under **Embedding model**.
38102
39-
- Use Ollama Models.
103+
### 7. Update Chat Configuration
40104

41-
![](https://github.com/infiniflow/ragflow/assets/12318111/60ff384e-5013-41ff-a573-9a543d237fd3)
105+
Update your chat model accordingly in **Chat Configuration**:
42106

43-
## Xinference
107+
> If your local model is an embedding model, update it on the configruation page of your knowledge base.
44108
45-
Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models.
109+
## Deploy a local model using Xinference
46110

47-
### Install
111+
Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
48112

49-
- [pip install "xinference[all]"](https://inference.readthedocs.io/en/latest/getting_started/installation.html)
50-
- [Docker](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html)
113+
:::note
114+
- For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).
115+
- For a complete list of supported models, see the [Builtin Models](https://inference.readthedocs.io/en/latest/models/builtin/).
116+
:::
117+
118+
To deploy a local model, e.g., **Llama3**, using Xinference:
119+
120+
### 1. Start an Xinference instance
51121

52-
To start a local instance of Xinference, run the following command:
53122
```bash
54123
$ xinference-local --host 0.0.0.0 --port 9997
55124
```
56-
### Launch Xinference
57125

58-
Decide which LLM you want to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**.
59-
Execute the following command to launch the model, remember to replace `${quantization}` with your chosen quantization method from the options listed above:
126+
### 2. Launch your local model
127+
128+
Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method
129+
:
60130
```bash
61131
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
62132
```
133+
### 3. Add Xinference
134+
135+
In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Xinference to RAGFlow:
136+
137+
![add xinference](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)
138+
139+
### 4. Complete basic Xinference settings
140+
141+
Enter an accessible base URL, such as `http://<your-xinference-endpoint-domain>:9997/v1`.
142+
143+
### 5. Update System Model Settings
63144

64-
### Use Xinference in RAGFlow
145+
Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model:
146+
147+
*You should now be able to find **mistral** from the dropdown list under **Chat model**.*
65148

66-
- Go to 'Settings > Model Providers > Models to be added > Xinference'.
67-
68-
![](https://github.com/infiniflow/ragflow/assets/12318111/bcbf4d7a-ade6-44c7-ad5f-0a92c8a73789)
149+
> If your local model is an embedding model, you should find your local model under **Embedding model**.
69150
70-
> Base URL: Enter the base URL where the Xinference service is accessible, like, `http://<your-xinference-endpoint-domain>:9997/v1`.
151+
### 7. Update Chat Configuration
71152

72-
- Use Xinference Models.
153+
Update your chat model accordingly in **Chat Configuration**:
73154

74-
![](https://github.com/infiniflow/ragflow/assets/12318111/b01fcb6f-47c9-4777-82e0-f1e947ed615a)
75-
![](https://github.com/infiniflow/ragflow/assets/12318111/1763dcd1-044f-438d-badd-9729f5b3a144)
155+
> If your local model is an embedding model, update it on the configruation page of your knowledge base.

docs/quickstart.mdx

+12-9
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ This quick start guide describes a general process from:
1818

1919
## Prerequisites
2020

21-
- CPU >= 4 cores
22-
- RAM >= 16 GB
23-
- Disk >= 50 GB
24-
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
21+
- CPU &ge; 4 cores
22+
- RAM &ge; 16 GB
23+
- Disk &ge; 50 GB
24+
- Docker &ge; 24.0.0 & Docker Compose &ge; v2.26.1
2525

2626
> If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/).
2727
@@ -30,11 +30,11 @@ This quick start guide describes a general process from:
3030
This section provides instructions on setting up the RAGFlow server on Linux. If you are on a different operating system, no worries. Most steps are alike.
3131

3232
<details>
33-
<summary>1. Ensure <code>vm.max_map_count</code> >= 262144:</summary>
33+
<summary>1. Ensure <code>vm.max_map_count</code> &ge; 262144:</summary>
3434

3535
`vm.max_map_count`. This value sets the the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation.
3636

37-
RAGFlow v0.7.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning the Elasticsearch component.
37+
RAGFlow v0.7.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component.
3838

3939
<Tabs
4040
defaultValue="linux"
@@ -168,7 +168,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If
168168
169169
5. In your web browser, enter the IP address of your server and log in to RAGFlow.
170170

171-
> - With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
171+
:::caution WARNING
172+
With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
173+
:::
172174

173175
## Configure LLMs
174176

@@ -188,7 +190,7 @@ To add and configure an LLM:
188190

189191
1. Click on your logo on the top right of the page **>** **Model Providers**:
190192

191-
![2 add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)
193+
![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814)
192194

193195
> Each RAGFlow account is able to use **text-embedding-v2** for free, a embedding model of Tongyi-Qianwen. This is why you can see Tongyi-Qianwen in the **Added models** list. And you may need to update your Tongyi-Qianwen API key at a later point.
194196
@@ -286,4 +288,5 @@ Conversations in RAGFlow are based on a particular knowledge base or multiple kn
286288

287289
![question1](https://github.com/infiniflow/ragflow/assets/93570324/bb72dd67-b35e-4b2a-87e9-4e4edbd6e677)
288290

289-
![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230)
291+
![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230)import { resetWarningCache } from 'prop-types';
292+

docs/references/api.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,10 @@ This method retrieves the history of a specified conversation session.
109109
- `content_with_weight`: Content of the chunk.
110110
- `doc_name`: Name of the *hit* document.
111111
- `img_id`: The image ID of the chunk. It is an optional field only for PDF, PPTX, and images. Call ['GET' /document/get/\<id\>](#get-document-content) to retrieve the image.
112-
- positions: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF.
113-
- similarity: The hybrid similarity.
114-
- term_similarity: The keyword simimlarity.
115-
- vector_similarity: The embedding similarity.
112+
- `positions`: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF.
113+
- `similarity`: The hybrid similarity.
114+
- `term_similarity`: The keyword simimlarity.
115+
- `vector_similarity`: The embedding similarity.
116116
- `doc_aggs`:
117117
- `doc_id`: ID of the *hit* document. Call ['GET' /document/get/\<id\>](#get-document-content) to retrieve the document.
118118
- `doc_name`: Name of the *hit* document.

0 commit comments

Comments
 (0)