myst

html_meta

title	description	keywords
AutoRAG - Run local model in AutoRAG	Learn how to run local model in AutoRAG	AutoRAG,RAG,RAG model,RAG LLM,embedding model,local model

Configure LLM & Embedding models

Index

Configure the LLM model
Configure the Embedding model

Configure the LLM model

Modules that use LLM model

Most of the modules that using LLM model can take llm parameter to specify the LLM model.

llama_index_llm

The following modules can use generator module, which including llama_index_llm.

hyde
query_decompose
multi_query_expansion
tree_summarize
refine

Supporting LLM Models

We support most of the LLMs that LlamaIndex supports. You can use different types of LLM interfaces by configuring the llm parameter:

LLM Model Type	llm parameter	Description
OpenAI	openai	For OpenAI models (GPT-3.5, GPT-4)
OpenAILike	openailike	For models with OpenAI-compatible APIs (e.g., Mistral, Claude)
Ollama	ollama	For locally running Ollama models
Bedrock	bedrock	For AWS Bedrock models

For example, if you want to use OpenAILike model, you can set llm parameter to openailike.

nodes:
  - node_line_name: node_line_1
    nodes:
      - node_type: generator
        modules:
          - module_type: llama_index_llm
            llm: openailike
            model: mistralai/Mistral-7B-Instruct-v0.2
            api_base: your_api_base
            api_key: your_api_key

At the above example, you can see model parameter. This is the parameter for the LLM model. You can set the model parameter for LlamaIndex LLM initialization. The most frequently used parameters are model, max_token, and temperature. Please check what you can set for the model parameter at LlamaIndex LLM.

Using HuggingFace Models

There are two main ways to use HuggingFace models:

Through OpenAILike Interface (Recommended for hosted API endpoints):

nodes:
  - node_line_name: node_line_1
    nodes:
      - node_type: generator
        modules:
          - module_type: llama_index_llm
            llm: openailike
            model: mistralai/Mistral-7B-Instruct-v0.2
            api_base: your_api_base
            api_key: your_api_key

Through Direct HuggingFace Integration (For local deployment):

nodes:
  - node_line_name: node_line_1
    nodes:
      - node_type: generator
        modules:
          - module_type: llama_index_llm
            llm: huggingface
            model_name: mistralai/Mistral-7B-Instruct-v0.2
            device_map: "auto"
            model_kwargs:
              torch_dtype: "float16"

Common Parameters

The most frequently used parameters for LLM configuration are:

model: The model identifier or name
max_tokens: Maximum number of tokens in the response
temperature: Controls randomness in the output (0.0 to 1.0)
api_base: API endpoint URL (for hosted models)
api_key: Authentication key (if required)

For a complete list of available parameters, please refer to the LlamaIndex LLM documentation.

Add more LLM models

You can add more LLM models for AutoRAG. You can add it by simply calling autorag.generator_models and add new key and value. For example, if you want to add MockLLM model for testing, execute the following code.

It was major update for LlamaIndex to v0.10.0.
The integration of llms must be installed to different packages.
So, before add your model, you should find and install the right package for your model.
You can find the package at [here](https://pretty-sodium-5e0.notion.site/ce81b247649a44e4b6b35dfb24af28a6?v=53b3c2ced7bb4c9996b81b83c9f01139).

import autorag
from llama_index.core.llms.mock import MockLLM

autorag.generator_models['mockllm'] = MockLLM

Then you can use mockllm at config YAML file.

When you add new LLM model, you should add class itself, not the instance.

Plus, it must follow LlamaIndex LLM's interface.

Configure the Embedding model

Modules that use Embedding model

Modules that using an embedding model can take embedding_model parameter to specify the LLM model.

vectordb

Configure the model name in the YAML file

For easier use, since v0.3.13, we support configuring model name in the YAML file.

We support this new configuration option, plus the legacy configuration option that is described below. So if you used before v0.3.13 version, it is okay to use the legacy configuration option.

We support the following embedding model types:

openai
huggingface
mock
ollama

You can configure the embedding model option directly in the YAML file vectordb section. If you want to know how to configure vectordb in AutoRAG, please go here.

For example,

vectordb:
- name: autorag_test
  db_type: milvus
  embedding_model:
  - type: huggingface
    model_name: intfloat/multilingual-e5-large-instruct
  ...

or

vectordb:
- name: autorag_test
  db_type: milvus
  embedding_model:
  - type: openai
    model_name: text-embedding-3-small
  ...

If you want to use your own embedding model, simply change the model_name at huggingface(or ollama) type embedding model configuration.

Supporting Embedding models (Legacy)

As default, we support OpenAI embedding models and some of the local models. To change the embedding model, you can change the embedding_model parameter to the following values:

Embedding Model Type	embedding_model parameter
Default openai embedding (text-embedding-ada-002)	openai
openai large embedding (text-embedding-3-large)	openai_embed_3_large
openai small embedding (text-embedding-3-small)	openai_embed_3_small
BAAI/bge-small-en-v1.5	huggingface_baai_bge_small
cointegrated/rubert-tiny2	huggingface_cointegrated_rubert_tiny2
sentence-transformers/all-mpnet-base-v2	huggingface_all_mpnet_base_v2
BAAI/bge-m3	huggingface_bge_m3

For example, if you want to use OpenAI text embedding large model, you can set embedding_model parameter to openai_embed_3_large when setting vectordb.

vectordb:
  - name: chroma_openai
    db_type: chroma
    client_type: persistent
    embedding_model: openai_embed_3_large
    collection_name: openai_embed_3_large
nodes:
  - node_line_name: node_line_1
    nodes:
      - node_type: retrieval
        modules:
          - module_type: vectordb
            vectordb: chroma_openai

Add your embedding models (Legacy)

You can add more embedding models for AutoRAG. You can add it by simply calling autorag.embedding_models and add new key and value. For example, if you want to add [KoSimCSE](https://huggingface.co/BM-K/KoSimCSE-roberta-multitask) model for Korean embedding, execute the following code.

import autorag
from autorag import LazyInit
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

autorag.embedding_models['kosimcse'] = LazyInit(HuggingFaceEmbedding, model_name="BM-K/KoSimCSE-roberta-multitask")

Then you can use kosimcse at config YAML file.

When you add new embedding model, you should use `LazyInit` class from autorag. The additional parameters have to be keyword parameter in the `LazyInit` initialization.

Use vllm

You can use vllm to use local LLM. For more information, please check out vllm generator module docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local_model.md

local_model.md

Configure LLM & Embedding models

Index

Configure the LLM model

Modules that use LLM model

Supporting LLM Models

Using HuggingFace Models

Common Parameters

Add more LLM models

Configure the Embedding model

Modules that use Embedding model

Configure the model name in the YAML file

Supporting Embedding models (Legacy)

Add your embedding models (Legacy)

Use vllm

Files

local_model.md

Latest commit

History

local_model.md

File metadata and controls

Configure LLM & Embedding models

Index

Configure the LLM model

Modules that use LLM model

Supporting LLM Models

Using HuggingFace Models

Common Parameters

Add more LLM models

Configure the Embedding model

Modules that use Embedding model

Configure the model name in the YAML file

Supporting Embedding models (Legacy)

Add your embedding models (Legacy)

Use vllm