From 767e58bbef82033b2810095dc957373eaea8edf0 Mon Sep 17 00:00:00 2001 From: Vissidarte-Herman Date: Wed, 5 Jun 2024 20:51:47 +0800 Subject: [PATCH 1/4] Updated Ollama part of local deployment --- README.md | 4 +- README_ja.md | 4 +- README_zh.md | 4 +- docs/guides/deploy_local_llm.md | 91 ++++++++++++++++++++++++--------- docs/quickstart.mdx | 19 ++++--- docs/references/api.md | 8 +-- 6 files changed, 88 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 77ab1f229d..68e4a996d4 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ docker pull infiniflow/ragflow:v0.7.0 - license + license

@@ -315,7 +315,7 @@ To launch the service from source: - [Quickstart](https://ragflow.io/docs/dev/) - [User guide](https://ragflow.io/docs/dev/category/user-guides) -- [Reference](https://ragflow.io/docs/dev/category/references) +- [References](https://ragflow.io/docs/dev/category/references) - [FAQ](https://ragflow.io/docs/dev/faq) ## πŸ“œ Roadmap diff --git a/README_ja.md b/README_ja.md index c87f74b679..1cef6cf59d 100644 --- a/README_ja.md +++ b/README_ja.md @@ -20,7 +20,7 @@ docker pull infiniflow/ragflow:v0.7.0 - license + license

@@ -262,7 +262,7 @@ $ bash ./entrypoint.sh - [Quickstart](https://ragflow.io/docs/dev/) - [User guide](https://ragflow.io/docs/dev/category/user-guides) -- [Reference](https://ragflow.io/docs/dev/category/references) +- [References](https://ragflow.io/docs/dev/category/references) - [FAQ](https://ragflow.io/docs/dev/faq) ## πŸ“œ γƒ­γƒΌγƒ‰γƒžγƒƒγƒ— diff --git a/README_zh.md b/README_zh.md index 4ea2df1da7..f62dd0437b 100644 --- a/README_zh.md +++ b/README_zh.md @@ -19,7 +19,7 @@ docker pull infiniflow/ragflow:v0.7.0 - license + license

@@ -282,7 +282,7 @@ $ systemctl start nginx - [Quickstart](https://ragflow.io/docs/dev/) - [User guide](https://ragflow.io/docs/dev/category/user-guides) -- [Reference](https://ragflow.io/docs/dev/category/references) +- [References](https://ragflow.io/docs/dev/category/references) - [FAQ](https://ragflow.io/docs/dev/faq) ## πŸ“œ θ·―ηΊΏε›Ύ diff --git a/docs/guides/deploy_local_llm.md b/docs/guides/deploy_local_llm.md index f61437b607..192332f905 100644 --- a/docs/guides/deploy_local_llm.md +++ b/docs/guides/deploy_local_llm.md @@ -5,42 +5,85 @@ slug: /deploy_local_llm # Deploy a local LLM -RAGFlow supports deploying LLMs locally using Ollama or Xinference. +RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models. -## Ollama +RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. RAGFlow v0.7.0 supports running two types of local models: chat models and embedding models. -One-click deployment of local LLMs, that is [Ollama](https://github.com/ollama/ollama). +:::tip NOTE +This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference. +::: -### Install +## Deploy a local model using Ollama -- [Ollama on Linux](https://github.com/ollama/ollama/blob/main/docs/linux.md) -- [Ollama Windows Preview](https://github.com/ollama/ollama/blob/main/docs/windows.md) -- [Docker](https://hub.docker.com/r/ollama/ollama) +[Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage. -### Launch Ollama +:::note +- For information about downloading Ollama, see [here](https://github.com/ollama/ollama?tab=readme-ov-file#ollama). +- For information about configuring Ollama server, see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server). +- For a complete list of supported models and variants, see the [Ollama model library](https://ollama.com/library). +::: + +To deploy a local model, e.g., **7b-chat-v1.5-q4_0**, using Ollama: + +1. Ensure that the service URL of Ollama is accessible. +2. Run your local model: + + ```bash + ollama run qwen:7b-chat-v1.5-q4_0 + ``` +
+ If your Ollama is installed through Docker, run the following instead: + + ```bash + docker exec -it ollama ollama run qwen:7b-chat-v1.5-q4_0 + ``` +
+ +3. In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Ollama to RAGFlow: + + ![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) + +4. In the popup window, complete basic settings for Ollama: + + - In this case, **qwen:7b-chat-v1.5-q4_0** is a chat model, so we choose **chat** as the model type. + - Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama. + - Ensure that the base URL you enter is accessible to RAGFlow. + - OPTIONAL: Switch on the toggle under **Does it support Vision?**, if your model includes an image-to-text model. + +![ollama settings](https://github.com/infiniflow/ragflow/assets/93570324/0ba3942e-27ba-457c-a26f-8ebe9edf0e52) + +:::caution NOTE +- If your Ollama and RAGFlow run on the same machine, use `http://localhost:11434` as base URL. +- If your Ollama and RAGFlow run on the same machine and Ollama is in Docker, use `http://host.docker.internal:11434` as base URL. +- If your Ollama runs on a different machine from RAGFlow, use `http://` as base URL. +::: + +:::danger WARNING +If your Ollama runs on a different machine, you may also need to update the system environments in **ollama.service**: -Decide which LLM you want to deploy ([here's a list for supported LLM](https://ollama.com/library)), say, **mistral**: -```bash -$ ollama run mistral -``` -Or, ```bash -$ docker exec -it ollama ollama run mistral +Environment="OLLAMA_HOST=0.0.0.0" +Environment="OLLAMA_MODELS=/APP/MODELS/OLLAMA" ``` -### Use Ollama in RAGFlow +See [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information. +::: -- Go to 'Settings > Model Providers > Models to be added > Ollama'. - -![](https://github.com/infiniflow/ragflow/assets/12318111/a9df198a-226d-4f30-b8d7-829f00256d46) +5. Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model: + + *You should now be able to find **7b-chat-v1.5-q4_0** from the dropdown list under **Chat model**.* + + > If your local model is an embedding model, you should find your local model under **Embedding model**. + +![system model settings](https://github.com/infiniflow/ragflow/assets/93570324/c627fb16-785b-4b84-a77f-4dec604570ed) -> Base URL: Enter the base URL where the Ollama service is accessible, like, `http://:11434`. +6. In this case, update your chat model in **Chat Configuration**: -- Use Ollama Models. +![chat config](https://github.com/infiniflow/ragflow/assets/93570324/7cec4026-a509-47a3-82ec-5f8e1f059442) -![](https://github.com/infiniflow/ragflow/assets/12318111/60ff384e-5013-41ff-a573-9a543d237fd3) + > If your local model is an embedding model, update it on the configruation page of your knowledge base. -## Xinference +## Deploy a local model using Xinference Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models. @@ -55,8 +98,8 @@ $ xinference-local --host 0.0.0.0 --port 9997 ``` ### Launch Xinference -Decide which LLM you want to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**. -Execute the following command to launch the model, remember to replace `${quantization}` with your chosen quantization method from the options listed above: +Decide which LLM to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**. +Execute the following command to launch the model, ensuring that you replace `${quantization}` with your chosen quantization method from the options listed above: ```bash $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} ``` diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index 3c25ffba8a..1b603efc55 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -18,10 +18,10 @@ This quick start guide describes a general process from: ## Prerequisites -- CPU >= 4 cores -- RAM >= 16 GB -- Disk >= 50 GB -- Docker >= 24.0.0 & Docker Compose >= v2.26.1 +- CPU ≥ 4 cores +- RAM ≥ 16 GB +- Disk ≥ 50 GB +- Docker ≥ 24.0.0 & Docker Compose ≥ v2.26.1 > If you have not installed Docker on your local machine (Windows, Mac, or Linux), see [Install Docker Engine](https://docs.docker.com/engine/install/). @@ -30,7 +30,7 @@ This quick start guide describes a general process from: This section provides instructions on setting up the RAGFlow server on Linux. If you are on a different operating system, no worries. Most steps are alike.
- 1. Ensure vm.max_map_count >= 262144: + 1. Ensure vm.max_map_count ≥ 262144: `vm.max_map_count`. This value sets the the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation. @@ -168,7 +168,9 @@ This section provides instructions on setting up the RAGFlow server on Linux. If 5. In your web browser, enter the IP address of your server and log in to RAGFlow. - > - With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations. +:::caution WARNING +With default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations. +::: ## Configure LLMs @@ -188,7 +190,7 @@ To add and configure an LLM: 1. Click on your logo on the top right of the page **>** **Model Providers**: - ![2 add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) + ![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) > Each RAGFlow account is able to use **text-embedding-v2** for free, a embedding model of Tongyi-Qianwen. This is why you can see Tongyi-Qianwen in the **Added models** list. And you may need to update your Tongyi-Qianwen API key at a later point. @@ -286,4 +288,5 @@ Conversations in RAGFlow are based on a particular knowledge base or multiple kn ![question1](https://github.com/infiniflow/ragflow/assets/93570324/bb72dd67-b35e-4b2a-87e9-4e4edbd6e677) - ![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230) + ![question2](https://github.com/infiniflow/ragflow/assets/93570324/7cc585ae-88d0-4aa2-817d-0370b2ad7230)import { resetWarningCache } from 'prop-types'; + diff --git a/docs/references/api.md b/docs/references/api.md index ec01421287..ab73b594ce 100644 --- a/docs/references/api.md +++ b/docs/references/api.md @@ -109,10 +109,10 @@ This method retrieves the history of a specified conversation session. - `content_with_weight`: Content of the chunk. - `doc_name`: Name of the *hit* document. - `img_id`: The image ID of the chunk. It is an optional field only for PDF, PPTX, and images. Call ['GET' /document/get/\](#get-document-content) to retrieve the image. - - positions: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF. - - similarity: The hybrid similarity. - - term_similarity: The keyword simimlarity. - - vector_similarity: The embedding similarity. + - `positions`: [page_number, [upleft corner(x, y)], [right bottom(x, y)]], the chunk position, only for PDF. + - `similarity`: The hybrid similarity. + - `term_similarity`: The keyword simimlarity. + - `vector_similarity`: The embedding similarity. - `doc_aggs`: - `doc_id`: ID of the *hit* document. Call ['GET' /document/get/\](#get-document-content) to retrieve the document. - `doc_name`: Name of the *hit* document. From b376ff094dcf5b61bda8e34e1d0f63b174de3185 Mon Sep 17 00:00:00 2001 From: Vissidarte-Herman Date: Thu, 6 Jun 2024 16:58:17 +0800 Subject: [PATCH 2/4] Updated Xinference part of local deployment --- docs/guides/deploy_local_llm.md | 120 +++++++++++++++++++++----------- docs/quickstart.mdx | 2 +- 2 files changed, 80 insertions(+), 42 deletions(-) diff --git a/docs/guides/deploy_local_llm.md b/docs/guides/deploy_local_llm.md index 192332f905..bfa7f38cb6 100644 --- a/docs/guides/deploy_local_llm.md +++ b/docs/guides/deploy_local_llm.md @@ -7,7 +7,7 @@ slug: /deploy_local_llm RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models. -RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. RAGFlow v0.7.0 supports running two types of local models: chat models and embedding models. +RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models. :::tip NOTE This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference. @@ -23,39 +23,56 @@ This user guide does not intend to cover much of the installation or configurati - For a complete list of supported models and variants, see the [Ollama model library](https://ollama.com/library). ::: -To deploy a local model, e.g., **7b-chat-v1.5-q4_0**, using Ollama: +To deploy a local model, e.g., **Llama3**, using Ollama: -1. Ensure that the service URL of Ollama is accessible. -2. Run your local model: +### 1. Check firewall settings - ```bash - ollama run qwen:7b-chat-v1.5-q4_0 - ``` +Ensure that your host machine's firewall allows inbound connections on port 11434. For example: + +```bash +sudo ufw allow 11434/tcp +``` +### 2. Ensure Ollama is accessible + +Restart system and use curl or your web browser to check if the service URL of your Ollama service at `http://localhost:11434` is accessible. + +```bash +Ollama is running +``` + +### 3. Run your local model + +```bash +ollama run llama3 +```
If your Ollama is installed through Docker, run the following instead: ```bash - docker exec -it ollama ollama run qwen:7b-chat-v1.5-q4_0 + docker exec -it ollama ollama run llama3 ```
-3. In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Ollama to RAGFlow: +### 4. Add Ollama + +In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Ollama to RAGFlow: - ![add llm](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) +![add ollama](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) -4. In the popup window, complete basic settings for Ollama: - - In this case, **qwen:7b-chat-v1.5-q4_0** is a chat model, so we choose **chat** as the model type. - - Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama. - - Ensure that the base URL you enter is accessible to RAGFlow. - - OPTIONAL: Switch on the toggle under **Does it support Vision?**, if your model includes an image-to-text model. +### 5. Complete basic Ollama settings -![ollama settings](https://github.com/infiniflow/ragflow/assets/93570324/0ba3942e-27ba-457c-a26f-8ebe9edf0e52) +In the popup window, complete basic settings for Ollama: + +1. Because **llama3** is a chat model, choose **chat** as the model type. +2. Ensure that the model name you enter here *precisely* matches the name of the local model you are running with Ollama. +3. Ensure that the base URL you enter is accessible to RAGFlow. +4. OPTIONAL: Switch on the toggle under **Does it support Vision?** if your model includes an image-to-text model. :::caution NOTE - If your Ollama and RAGFlow run on the same machine, use `http://localhost:11434` as base URL. - If your Ollama and RAGFlow run on the same machine and Ollama is in Docker, use `http://host.docker.internal:11434` as base URL. -- If your Ollama runs on a different machine from RAGFlow, use `http://` as base URL. +- If your Ollama runs on a different machine from RAGFlow, use `http://:11434` as base URL. ::: :::danger WARNING @@ -69,50 +86,71 @@ Environment="OLLAMA_MODELS=/APP/MODELS/OLLAMA" See [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information. ::: -5. Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model: - - *You should now be able to find **7b-chat-v1.5-q4_0** from the dropdown list under **Chat model**.* +:::caution WARNING +Improper base URL settings will trigger the following error: +```bash +Max retries exceeded with url: /api/chat (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) +``` +::: + +### 6. Update System Model Settings - > If your local model is an embedding model, you should find your local model under **Embedding model**. +Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model: + +*You should now be able to find **llama3** from the dropdown list under **Chat model**.* -![system model settings](https://github.com/infiniflow/ragflow/assets/93570324/c627fb16-785b-4b84-a77f-4dec604570ed) +> If your local model is an embedding model, you should find your local model under **Embedding model**. -6. In this case, update your chat model in **Chat Configuration**: +### 7. Update Chat Configuration -![chat config](https://github.com/infiniflow/ragflow/assets/93570324/7cec4026-a509-47a3-82ec-5f8e1f059442) +Update your chat model accordingly in **Chat Configuration**: - > If your local model is an embedding model, update it on the configruation page of your knowledge base. +> If your local model is an embedding model, update it on the configruation page of your knowledge base. ## Deploy a local model using Xinference -Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models. +Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models. + +:::note +- For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/). +- For a complete list of supported models, see the [Builtin Models](https://inference.readthedocs.io/en/latest/models/builtin/). +::: -### Install +To deploy a local model, e.g., **Llama3**, using Xinference: -- [pip install "xinference[all]"](https://inference.readthedocs.io/en/latest/getting_started/installation.html) -- [Docker](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html) +### 1. Start an Xinference instance -To start a local instance of Xinference, run the following command: ```bash $ xinference-local --host 0.0.0.0 --port 9997 ``` -### Launch Xinference -Decide which LLM to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**. -Execute the following command to launch the model, ensuring that you replace `${quantization}` with your chosen quantization method from the options listed above: +### 2. Launch your local model + +Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method +: ```bash $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} ``` +### 3. Add Xinference + +In RAGFlow, click on your logo on the top right of the page **>** **Model Providers** and add Xinference to RAGFlow: + +![add xinference](https://github.com/infiniflow/ragflow/assets/93570324/10635088-028b-4b3d-add9-5c5a6e626814) + +### 4. Complete basic Xinference settings -### Use Xinference in RAGFlow +Enter an accessible base URL, such as `http://:9997/v1`. + +### 5. Update System Model Settings + +Click on your logo **>** **Model Providers** **>** **System Model Settings** to update your model: + +*You should now be able to find **mistral** from the dropdown list under **Chat model**.* -- Go to 'Settings > Model Providers > Models to be added > Xinference'. - -![](https://github.com/infiniflow/ragflow/assets/12318111/bcbf4d7a-ade6-44c7-ad5f-0a92c8a73789) +> If your local model is an embedding model, you should find your local model under **Embedding model**. -> Base URL: Enter the base URL where the Xinference service is accessible, like, `http://:9997/v1`. +### 7. Update Chat Configuration -- Use Xinference Models. +Update your chat model accordingly in **Chat Configuration**: -![](https://github.com/infiniflow/ragflow/assets/12318111/b01fcb6f-47c9-4777-82e0-f1e947ed615a) -![](https://github.com/infiniflow/ragflow/assets/12318111/1763dcd1-044f-438d-badd-9729f5b3a144) \ No newline at end of file +> If your local model is an embedding model, update it on the configruation page of your knowledge base. \ No newline at end of file diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index 1b603efc55..3ab8966796 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -34,7 +34,7 @@ This section provides instructions on setting up the RAGFlow server on Linux. If `vm.max_map_count`. This value sets the the maximum number of memory map areas a process may have. Its default value is 65530. While most applications require fewer than a thousand maps, reducing this value can result in abmornal behaviors, and the system will throw out-of-memory errors when a process reaches the limitation. - RAGFlow v0.7.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning the Elasticsearch component. + RAGFlow v0.7.0 uses Elasticsearch for multiple recall. Setting the value of `vm.max_map_count` correctly is crucial to the proper functioning of the Elasticsearch component. Date: Thu, 6 Jun 2024 18:58:27 +0800 Subject: [PATCH 3/4] minor --- docs/guides/deploy_local_llm.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/guides/deploy_local_llm.md b/docs/guides/deploy_local_llm.md index bfa7f38cb6..1cc33880fd 100644 --- a/docs/guides/deploy_local_llm.md +++ b/docs/guides/deploy_local_llm.md @@ -76,11 +76,10 @@ In the popup window, complete basic settings for Ollama: ::: :::danger WARNING -If your Ollama runs on a different machine, you may also need to update the system environments in **ollama.service**: +If your Ollama runs on a different machine, you may also need to set the `OLLAMA_HOST` in **ollama.service**: ```bash Environment="OLLAMA_HOST=0.0.0.0" -Environment="OLLAMA_MODELS=/APP/MODELS/OLLAMA" ``` See [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information. From e6ca0d63cf967dbabf0fe57dc2576d0189428e00 Mon Sep 17 00:00:00 2001 From: Vissidarte-Herman Date: Thu, 6 Jun 2024 18:59:54 +0800 Subject: [PATCH 4/4] minor --- docs/guides/deploy_local_llm.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guides/deploy_local_llm.md b/docs/guides/deploy_local_llm.md index 1cc33880fd..17b4b0b63d 100644 --- a/docs/guides/deploy_local_llm.md +++ b/docs/guides/deploy_local_llm.md @@ -76,13 +76,13 @@ In the popup window, complete basic settings for Ollama: ::: :::danger WARNING -If your Ollama runs on a different machine, you may also need to set the `OLLAMA_HOST` in **ollama.service**: +If your Ollama runs on a different machine, you may also need to set the `OLLAMA_HOST` environment variable to `0.0.0.0` in **ollama.service** (Note that this is *NOT* the base URL): ```bash Environment="OLLAMA_HOST=0.0.0.0" ``` -See [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information. +See [this guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server) for more information. ::: :::caution WARNING