Please improve error messages and getting started documentation #1416

chris-hatton · 2023-12-10T11:13:49Z

Is your feature request related to a problem? Please describe.
Yes; I can't get your sample working. I'm using a cublas build with an Nvidia GPU; I've followed the setup carefully, and I don't see any log errors related to initialising the GPU; instead I always see failed to load model/EOF/bad magic. I've tried many models including luna-ai-llama2 from your sample, and have created the four files specified.

Describe the solution you'd like
Please Improve the error messages surfaced by Local.AI, and your documentation.
It looks like errors are heavily obfuscated at the moment due to the internal architecture of Local.AI You seen to be separating front and back end as two separate services, and the back-end does a poor job of surfacing error messages to the front end.

Describe alternatives you've considered
I have no alternatives, except to give up & go home.

Additional context
I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

What's the relationship between the naming of the model and the ID shown
Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this
What's f16 mode? Do I have to enable it when working with a GPU?
What's the difference between using CUDA 11 or 12 build?

The text was updated successfully, but these errors were encountered:

lunamidori5 · 2023-12-10T15:34:27Z

Oops, I did not mean to link it like that...

lunamidori5 · 2023-12-10T15:40:17Z

I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

What's the relationship between the naming of the model and the ID shown

Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this

What's f16 mode? Do I have to enable it when working with a GPU?

What's the difference between using CUDA 11 or 12 build?

@chris-hatton Deeply sorry for this my friend! I have added a update to the How Tos (not the Getting Started) pages to better clear some of this up. If you could review my PR and let me know if there are changes ill happy make them

QNA:

What's the relationship between the naming of the model and the ID shown

Theres not, the name is just the name you send with your OpenAI request to Localai, so it can be what ever you want it to be

Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this

No but the output of the model will be really jank and not well formatted without them, can you run the model raw with no configs but again, it will not do as well as if you setup the model using the 5 files needed.

What's f16 mode? Do I have to enable it when working with a GPU?

Yes for GPUs on other OS like Windows and Linux you will need to set this to true (It gives a speed up to the model when ran on GPUs)

What's the difference between using CUDA 11 or 12 build?

On other hardware some GPU only support CUDA 11 or CUDA 12, Where as with M based Macs they don't want to be using docker (see the getting started page).

I am not one of the devs but just someone that keeps the How To pages updated. So input is super welcomed!

localai-bot · 2023-12-12T12:08:46Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I apologize for the confusion and frustration you are experiencing. We will work on improving our documentation and error messages to make them more clear and helpful. In the meantime, here is some additional information that may help:

The naming of the model file is not related to the ID shown in the logs. The id is a unique identifier for the model within the LocalAI system.
The four files (llama.cpp, llama.h, prompt_cache.py, and prompt_cache.db) are vital for the LocalAI system to function correctly. If they are missing or incorrect, the system may not be able to load the model or perform certain operations.
F16 mode is a feature of LLama that allows for better performance on some GPUs. It is enabled by default in the llama.cpp file and can be disabled if needed.
The CUDA build version does not affect the functionality of LocalAI, but it may impact the performance of the system. We recommend using the latest CUDA version available for your hardware to ensure the best performance.

Sources:

mudler · 2024-01-01T10:17:23Z

I think this wasn't meant to be closed - there is a lot to do in this area

mudler · 2024-01-01T10:18:45Z

I'm trying to re-work docs around the new features that should also ease out deployments. I'm currently crafting good examples as well but will take a bit to test them all.

lunamidori5 · 2024-01-02T03:28:30Z

I think I linked it wrong with the PR I did, sorry about that @mudler

apstrom · 2024-01-02T17:02:32Z

I would like to rejoin this call for additional documentation.

I would love to use LocalAI as a central repository for LLM operation on my network. I cannot, however, get models running or link OpenAI calls to LocalAI. The documentation in this regard does not provide enough detail for me to understand the concepts behind commands (which is what I need to then properly use commands in scenarios for which there isn't a tutorial).

More to this point--and I say this with respect, because I know how difficult writing these kinds of materials can be--the materials that do exist do not provide sufficient guidance for a non-technical user (like myself--I am a lawyer) to get the API running and to connected to the API.

An example of this issue: I have three models with associated YAML files in the model directory. These files are based on the how-to examples. I simply want to get the LocalAI running and connect it to Ollama-WebUI. This test should only require me to connect LocalAI to Ollama-WebUI via the OpenAI configuration in Ollama's UI (key and IP address). I cannot, however, make this connection or view the LLMs that should be in the model directory. I cannot begin troubleshooting this issue because I do not know if the model files are being identified as valid models by LocalAI; the documentation only allows me to check whether a model can be loaded, not whether LocalAI will list the models in the model directory when called to make such a list.

Some apps require an OpenAI key for security. The documentation does not mention this key. Can the key be omitted? Or does a key need to be provided? If a key needs to be provided, what is LocalAI expecting?

Another example: what if the application to connect to the API is not running on the same machine? Does the IP address then need to point to the LocalAI machine's instance on the LAN? If running both apps in Docker (thus bridging the apps to the localhost's IP address, but using internal IPs as bridges), does http://localhost:8080 allow the docker app making calls to LocalAI to connect?

lunamidori5 · 2024-01-17T03:47:41Z

@chris-hatton / @apstrom the how tos have been updated and would love your review on easy of install of new models. make note, this the new model installer it is really easy and self updating to the best known models of each size

Here is a updated link to the how tos - https://io.midori-ai.xyz/howtos/

apstrom · 2024-01-17T17:09:06Z

@lunamidori5
As promised.

The presentation of information is clearer.

What's missing from the documentation is a page that provides all of the possible YAML settings (i.e. a blank template YAML with every possible setting in the document, but commented out). Having this document as a reference will be useful for more advanced applications.

Similarly, a general description of the way in which LocalAI functions on the backend would be helpful. That description will allow users to more easily diagnose errors. In a similar vein, a page that describes the operational differences between embedding models and inference models will be useful to users that are new to AI. This description can also include suggested settings.

Embedding models need to be understood as parts of much larger AI operations. In my case, for example, I need to use embeddings to process massive amounts of legal texts. This use case differs from a chat use case or a single document query case. Do embedding models need any specific config. settings to allow this kind of use? A page that helps describe specific use cases and suggested settings will allow users to get their applications running much more quickly.

Finally, the matter of Huggingface models. I am growing to dislike Huggingface's pytorch models because they do not appear to run natively on LocalAI. Some sort of conversion is required to get the models running. If LocalAI can manage this conversion, great: a page that describes the process would be very helpful. If not, then a page that describes the kinds of model filetypes that LocalAI can run would be useful.

Note that I am not asking for a page like the model compatibility page in the old documentation. That page is helpful and should be updated / maintained. I am asking for a page that deals with specific file types or requirements from file types that LocalAI will run out-of-the-box (so-to-speak).

lunamidori5 · 2024-01-17T17:20:17Z

@apstrom
Thank you!

Blank Yaml is already on the site with everything you can use for a GGUF based model. (more on that in a moment) Linked here - https://localai.io/advanced/

I do think that page needs some love but im 90% sure Mud is on that!

For huggingface models, GGUF models are fully supported. I am really not a fan of huggingfaces APIs and lacking of good docs (tell me how to save_pretrained() cpu please?) and thats why I want my docs to be really good! So ill get to work on updating them!

As for embedding model, the one used is great, its able to deal with over 10gbs in under 10s, so im not sure if it is the app your using to send the requests in, I know that anythingllm can be a bit picky on settings, so check the app your using. If it is still not working Ill open you a support chat and see if we can fix that. (Again Ill update the docs to be a bit more clear! Thank you!)

lunamidori5 · 2024-01-17T17:22:42Z

I as a docs volunteer I only know so much about the code, but ill look into as best as I can!

heluca · 2024-12-30T19:02:12Z

Great project though getting started is frustrating. I have some suggestions and questions for the getting started documentation. I hope it's helpful. All the best in the New Year!

There is an Overview and a Getting Started. The Overview jumps into how to get started in the second paragraph, and the Getting Started starts with an overview.

Overview:
Remove 'Start LocalAI' and keep this in get started.
This section also mentions optional GPU Acceleration is available and links the the build section, which implies that the entire project needs to be built from source to use a GPU. This isn't true, but I've read blog posts that state that LocalAI is ideal for people that want to use a CPU and Ollama is for people that want to use a GPU.

Getting Started:
The text jumps into a security consideration about the API key, but doesn't actually explain that it is wide open by default and that API_KEY is configured by an environment variable that is set in '/etc/localai.env'.
Then it spends a second on Bash, then Mac, then AIO Images (which turn out to be Docker images) but this is adds to the confusion as the bash installer has an option for deploying with Docker. Both the AIO Images and Bash Installer refer to manual options that have nothing to do with one another.

Split getting started and provide enough into to actually get started:

Installing Locally on Linux on Bash
GS on MacOS with Homebrew
Using Docker Containers including Kubernetes

Ok, so I want to install locally on a Linux machine with an Nvidia GPU.

Installing Locally on Linux on Bash.

I download the script and skim through it. It looks like I just run it, there is logic in the script to handle running as root and with sudo. However, I get errors that I should run it as root:

./install.sh 
WARNING: you should run this program as super-user.
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
WARNING: you should run this program as super-user.
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
  Downloading local-ai...
##################################################################################################### 100.0%

To sudo or not to sudo?

But then it asks for my password for sudo:

  Installing local-ai to /usr/local/bin...
[sudo] password for robert: 
  Adding current user to local-ai group...
  Creating local-ai systemd service...
  Enabling and starting local-ai service...
  NVIDIA GPU installed.

I can say that sudo leaves a trail of permissions errors, but there is no explanation either way, just the warnings. So sudo it is.

Installing local-ai to /usr/local/bin...
  Adding current user to local-ai group...
  Creating local-ai systemd service...
  Enabling and starting local-ai service...
  NVIDIA GPU installed.

Um... ok. Now what?

The Quickstart should explain that LocalAI is configured with environment variables, and they are set in the /etc/localai/.env file for local installs.

ADDRESS=0.0.0.0:8080 
# Mention the importance of using setting an API_KEY or changing the address to 127.0.0.1 for local only access.  

API_KEY=
# Are there any requirements or will any string do?  If I set it, I see 401 errors in debug, but the web interface just appears to be waiting.  The web interface has an OpenAI key, but I'm not using OpenAI.  I still had a 401 error after setting it, but  I don't need it so I erased it and it worked ok.

THREADS=32
# CPU threads.  Ok, but I want to use my GPU.  I have a 16 core CPU, any guidance on how many threads?

MODELS_PATH=/usr/share/local-ai/models
# This folder is going to get huge, and even though I am a member of the localai group, I don't even get read access to these files and root is the owner. Not a big deal, the web interface handles model install and deletion, but none of this is obvious at this point. Also, the models are configured by editing their YAML files in this folder which again need to be done as root.  

CUDA=true
# Again, I want to use my GPU.  Do I need this, is true default?  

CUDA_DEVICE_POOL_GPU_OVERRIDE=1
# What does this do?

CUDA_VISIBLE_DEVICES=0
# 0 is not none!  I have 1 CUDA device, so the first one is numbered 0.  If I have four, it would be 0,1,2,3.  Is there an all option?  A none option?  Dunno.

Using an Nvidia GPU

If using Nvidia, now is a good time to check if CUDA is installed. Run nvidia-smi to check that CUDA is installed and what version.

Launch with debug

Now, stop the service:
systemctl stop local-ai

And run it in the command line with debug enabled to see what's going on and see if there are any problems because the web browser won't.
sudo su
local-ai --debug

Once it is up and running, it will show a helpful:
systemctl stop local-ai
'''
INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
'''

Web Interface

Now, go to Models. Suggest some models from your table: https://localai.io/model-compatibility/

Then explain how to configure a the suggested model YAML files to use Nvidia CUDA and AMD GPUs.

Keep an eye on the CLI debug for 401, 500, file access issues and other problems.

Hello-World-Traveler · 2025-02-09T22:42:17Z

ADDRESS=0.0.0.0:8080
Mention the importance of using setting an API_KEY or changing the address to 127.0.0.1 for local only access.

API_KEY=
Are there any requirements or will any string do? If I set it, I see 401 errors in debug, but the web interface just appears to be waiting. The web interface has an OpenAI key, but I'm not using OpenAI. I still had a 401 error after setting it, but I don't need it so I erased it and it worked ok.

THREADS=32
CPU threads. Ok, but I want to use my GPU. I have a 16 core CPU, any guidance on how many threads?

MODELS_PATH=/usr/share/local-ai/models
This folder is going to get huge, and even though I am a member of the localai group, I don't even get read access to these files and root is the owner. Not a big deal, the web interface handles model install and deletion, but none of this is obvious at this point. Also, the models are configured by editing their YAML files in this folder which again need to be done as root.

CUDA=true
#Again, I want to use my GPU. Do I need this, is true default?

CUDA_DEVICE_POOL_GPU_OVERRIDE=1
What does this do?

CUDA_VISIBLE_DEVICES=0
0 is not none! I have 1 CUDA device, so the first one is numbered 0. If I have four, it would be 0,1,2,3. Is there an all option? A none option? Dunno.

These are good questions. I downloaded the AIO from quick start. 44GB image (localai/localai:latest-aio-gpu-nvidia-cuda-12) to find it's downloading models, doesn't matter if I remove the mapped models folder. I know it says "LocalAI will automatically download all the required models" so what is the point of a 44GB image?

The getting started page does need to be more clear. Could be spilt into Getting Started (basic) for those that just want to test it a little and Getting Started (adv)?

I think threads is often more commonly associated with CPUs then GPUs? and CUDA_VISIBLE_DEVICES=0 is the GPU card as on linux card 0 is the main GPU?

chris-hatton added the enhancement New feature or request label Dec 10, 2023

chris-hatton assigned mudler Dec 10, 2023

chris-hatton mentioned this issue Dec 10, 2023

rpc error: code = Unknown desc = unimplemented #800

Open

mudler mentioned this issue Dec 10, 2023

[EPIC] v2 UX refactoring #1373

Open

lunamidori5 mentioned this issue Dec 10, 2023

How To Updates / Model Used Switched / Removed "docker-compose" (RIP) #1417

Merged

1 task

lunamidori5 self-assigned this Dec 10, 2023

lunamidori5 closed this as completed Dec 13, 2023

mudler reopened this Jan 1, 2024

This was referenced Jan 2, 2024

feat: embedded model configurations, add popular model examples, refactoring #1532

Merged

feat: share models by url #1522

Merged

mudler mentioned this issue Jan 13, 2024

docs/examples: enhancements #1572

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please improve error messages and getting started documentation #1416

Please improve error messages and getting started documentation #1416

chris-hatton commented Dec 10, 2023 •

edited

Loading

lunamidori5 commented Dec 10, 2023

lunamidori5 commented Dec 10, 2023 •

edited

Loading

localai-bot commented Dec 12, 2023

mudler commented Jan 1, 2024

mudler commented Jan 1, 2024

lunamidori5 commented Jan 2, 2024

apstrom commented Jan 2, 2024

lunamidori5 commented Jan 17, 2024

apstrom commented Jan 17, 2024

lunamidori5 commented Jan 17, 2024

lunamidori5 commented Jan 17, 2024

heluca commented Dec 30, 2024 •

edited

Loading

Hello-World-Traveler commented Feb 9, 2025

Please improve error messages and getting started documentation #1416

Please improve error messages and getting started documentation #1416

Comments

chris-hatton commented Dec 10, 2023 • edited Loading

lunamidori5 commented Dec 10, 2023

lunamidori5 commented Dec 10, 2023 • edited Loading

localai-bot commented Dec 12, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

mudler commented Jan 1, 2024

mudler commented Jan 1, 2024

lunamidori5 commented Jan 2, 2024

apstrom commented Jan 2, 2024

lunamidori5 commented Jan 17, 2024

apstrom commented Jan 17, 2024

lunamidori5 commented Jan 17, 2024

lunamidori5 commented Jan 17, 2024

heluca commented Dec 30, 2024 • edited Loading

Split getting started and provide enough into to actually get started:

Installing Locally on Linux on Bash.

To sudo or not to sudo?

Using an Nvidia GPU

Launch with debug

Web Interface

Hello-World-Traveler commented Feb 9, 2025

chris-hatton commented Dec 10, 2023 •

edited

Loading

lunamidori5 commented Dec 10, 2023 •

edited

Loading

heluca commented Dec 30, 2024 •

edited

Loading