Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please improve error messages and getting started documentation #1416

Open
chris-hatton opened this issue Dec 10, 2023 · 11 comments
Open

Please improve error messages and getting started documentation #1416

chris-hatton opened this issue Dec 10, 2023 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@chris-hatton
Copy link

chris-hatton commented Dec 10, 2023

Is your feature request related to a problem? Please describe.
Yes; I can't get your sample working. I'm using a cublas build with an Nvidia GPU; I've followed the setup carefully, and I don't see any log errors related to initialising the GPU; instead I always see failed to load model/EOF/bad magic. I've tried many models including luna-ai-llama2 from your sample, and have created the four files specified.

Describe the solution you'd like
Please Improve the error messages surfaced by Local.AI, and your documentation.
It looks like errors are heavily obfuscated at the moment due to the internal architecture of Local.AI You seen to be separating front and back end as two separate services, and the back-end does a poor job of surfacing error messages to the front end.

Describe alternatives you've considered
I have no alternatives, except to give up & go home.

Additional context
I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

  • What's the relationship between the naming of the model and the ID shown
  • Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this
  • What's f16 mode? Do I have to enable it when working with a GPU?
  • What's the difference between using CUDA 11 or 12 build?
@lunamidori5
Copy link
Collaborator

Oops, I did not mean to link it like that...

@lunamidori5
Copy link
Collaborator

lunamidori5 commented Dec 10, 2023

I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

  • What's the relationship between the naming of the model and the ID shown
  • Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this
  • What's f16 mode? Do I have to enable it when working with a GPU?
  • What's the difference between using CUDA 11 or 12 build?

@chris-hatton Deeply sorry for this my friend! I have added a update to the How Tos (not the Getting Started) pages to better clear some of this up. If you could review my PR and let me know if there are changes ill happy make them

QNA:

  • What's the relationship between the naming of the model and the ID shown

Theres not, the name is just the name you send with your OpenAI request to Localai, so it can be what ever you want it to be

  • Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this

No but the output of the model will be really jank and not well formatted without them, can you run the model raw with no configs but again, it will not do as well as if you setup the model using the 5 files needed.

  • What's f16 mode? Do I have to enable it when working with a GPU?

Yes for GPUs on other OS like Windows and Linux you will need to set this to true (It gives a speed up to the model when ran on GPUs)

  • What's the difference between using CUDA 11 or 12 build?

On other hardware some GPU only support CUDA 11 or CUDA 12, Where as with M based Macs they don't want to be using docker (see the getting started page).

I am not one of the devs but just someone that keeps the How To pages updated. So input is super welcomed!

@lunamidori5 lunamidori5 self-assigned this Dec 10, 2023
@localai-bot
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I apologize for the confusion and frustration you are experiencing. We will work on improving our documentation and error messages to make them more clear and helpful. In the meantime, here is some additional information that may help:

  • The naming of the model file is not related to the ID shown in the logs. The id is a unique identifier for the model within the LocalAI system.
  • The four files (llama.cpp, llama.h, prompt_cache.py, and prompt_cache.db) are vital for the LocalAI system to function correctly. If they are missing or incorrect, the system may not be able to load the model or perform certain operations.
  • F16 mode is a feature of LLama that allows for better performance on some GPUs. It is enabled by default in the llama.cpp file and can be disabled if needed.
  • The CUDA build version does not affect the functionality of LocalAI, but it may impact the performance of the system. We recommend using the latest CUDA version available for your hardware to ensure the best performance.

Sources:

@mudler
Copy link
Owner

mudler commented Jan 1, 2024

I think this wasn't meant to be closed - there is a lot to do in this area

@mudler mudler reopened this Jan 1, 2024
@mudler
Copy link
Owner

mudler commented Jan 1, 2024

I'm trying to re-work docs around the new features that should also ease out deployments. I'm currently crafting good examples as well but will take a bit to test them all.

@lunamidori5
Copy link
Collaborator

I think I linked it wrong with the PR I did, sorry about that @mudler

@apstrom
Copy link

apstrom commented Jan 2, 2024

I would like to rejoin this call for additional documentation.

I would love to use LocalAI as a central repository for LLM operation on my network. I cannot, however, get models running or link OpenAI calls to LocalAI. The documentation in this regard does not provide enough detail for me to understand the concepts behind commands (which is what I need to then properly use commands in scenarios for which there isn't a tutorial).

More to this point--and I say this with respect, because I know how difficult writing these kinds of materials can be--the materials that do exist do not provide sufficient guidance for a non-technical user (like myself--I am a lawyer) to get the API running and to connected to the API.

An example of this issue: I have three models with associated YAML files in the model directory. These files are based on the how-to examples. I simply want to get the LocalAI running and connect it to Ollama-WebUI. This test should only require me to connect LocalAI to Ollama-WebUI via the OpenAI configuration in Ollama's UI (key and IP address). I cannot, however, make this connection or view the LLMs that should be in the model directory. I cannot begin troubleshooting this issue because I do not know if the model files are being identified as valid models by LocalAI; the documentation only allows me to check whether a model can be loaded, not whether LocalAI will list the models in the model directory when called to make such a list.

Some apps require an OpenAI key for security. The documentation does not mention this key. Can the key be omitted? Or does a key need to be provided? If a key needs to be provided, what is LocalAI expecting?

Another example: what if the application to connect to the API is not running on the same machine? Does the IP address then need to point to the LocalAI machine's instance on the LAN? If running both apps in Docker (thus bridging the apps to the localhost's IP address, but using internal IPs as bridges), does http://localhost:8080 allow the docker app making calls to LocalAI to connect?

@lunamidori5
Copy link
Collaborator

@chris-hatton / @apstrom the how tos have been updated and would love your review on easy of install of new models. make note, this the new model installer it is really easy and self updating to the best known models of each size

Here is a updated link to the how tos - https://io.midori-ai.xyz/howtos/

@apstrom
Copy link

apstrom commented Jan 17, 2024

@lunamidori5
As promised.

The presentation of information is clearer.

What's missing from the documentation is a page that provides all of the possible YAML settings (i.e. a blank template YAML with every possible setting in the document, but commented out). Having this document as a reference will be useful for more advanced applications.

Similarly, a general description of the way in which LocalAI functions on the backend would be helpful. That description will allow users to more easily diagnose errors. In a similar vein, a page that describes the operational differences between embedding models and inference models will be useful to users that are new to AI. This description can also include suggested settings.

Embedding models need to be understood as parts of much larger AI operations. In my case, for example, I need to use embeddings to process massive amounts of legal texts. This use case differs from a chat use case or a single document query case. Do embedding models need any specific config. settings to allow this kind of use? A page that helps describe specific use cases and suggested settings will allow users to get their applications running much more quickly.

Finally, the matter of Huggingface models. I am growing to dislike Huggingface's pytorch models because they do not appear to run natively on LocalAI. Some sort of conversion is required to get the models running. If LocalAI can manage this conversion, great: a page that describes the process would be very helpful. If not, then a page that describes the kinds of model filetypes that LocalAI can run would be useful.

Note that I am not asking for a page like the model compatibility page in the old documentation. That page is helpful and should be updated / maintained. I am asking for a page that deals with specific file types or requirements from file types that LocalAI will run out-of-the-box (so-to-speak).

@lunamidori5
Copy link
Collaborator

@apstrom
Thank you!

Blank Yaml is already on the site with everything you can use for a GGUF based model. (more on that in a moment) Linked here - https://localai.io/advanced/

I do think that page needs some love but im 90% sure Mud is on that!

For huggingface models, GGUF models are fully supported. I am really not a fan of huggingfaces APIs and lacking of good docs (tell me how to save_pretrained() cpu please?) and thats why I want my docs to be really good! So ill get to work on updating them!

As for embedding model, the one used is great, its able to deal with over 10gbs in under 10s, so im not sure if it is the app your using to send the requests in, I know that anythingllm can be a bit picky on settings, so check the app your using. If it is still not working Ill open you a support chat and see if we can fix that. (Again Ill update the docs to be a bit more clear! Thank you!)

@lunamidori5
Copy link
Collaborator

I as a docs volunteer I only know so much about the code, but ill look into as best as I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants