采用vllm部署大模型，使用xinference部署的embedding，集成在代码中，出现了问题 #29

Joker-sad · 2024-11-22T03:41:30Z

from fast_graphrag import GraphRAG
from fast_graphrag._llm import OpenAIEmbeddingService, OpenAILLMService

from typing import List

DOMAIN = "Novel Analysis"
QUERIES: List[str] = [
"Who are the main characters in this novel?",
"What locations are featured in the story?",
"What are the relationships between the characters?",
]
ENTITY_TYPES: List[str] = ["Character", "Location", "Event"]

设置工作目录

working_dir = "./examples/ignore/hp"

初始化 GraphRAG，去掉不支持的 timeout 参数

grag = GraphRAG(
working_dir=working_dir,
domain=DOMAIN,
example_queries="\n".join(QUERIES),
entity_types=ENTITY_TYPES,
config=GraphRAG.Config(
llm_service=OpenAILLMService(
model="XXXXX/Qwen2-7B-Instruct-GPTQ-Int4",
base_url="http://XXXX:XXXX/v1/",
api_key="token-12",
),
embedding_service=OpenAIEmbeddingService(
model="bge-base-zh",
base_url="http://XXXXX:XXXX/v1/",
api_key="token-12",
embedding_dim=512,),
),
)

try:
with open("./book.txt", "r", encoding="utf-8") as f:
story_content = f.read()
grag.insert(story_content)
except Exception as e:
print(f"插入文本时发生错误: {e}")

查询示例

try:
response = grag.query("这本小说的主要人物有哪些？").response
print(response)
except Exception as e:
print(f"查询时发生错误: {e}")

报错：Extracting data: 0%| | 0/1 [00:00<?, ?it/sError during information extraction from document: Request timed out.
Extracting data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [27:13<00:00, 1633.24s/it]
Error during query: Request timed out.
查询时发生错误: Request timed out.

问题是请求时间过程，但是我绝对实际的问题可能不是，总觉得是不是我部署的模型不能够在代码中使用

liukidar · 2024-11-22T18:02:41Z

Hello, it looks like there are some problems with the language model you're using as it is not able to provide an answer. What service are you using?

Joker-sad · 2024-11-25T01:32:52Z

您好，您使用的语言模型似乎存在一些问题，因为它无法提供答案。您使用的是什么服务？
您好，我是用的模型是qwen2.5-chat 使用vllm来部署的

WooyoohL · 2024-12-17T11:30:18Z

I think I meet the similar problem in my project. Specifically, I use the xinference and deploy the qwen2.5-32b-instruct llm model and bge-1.5-large embedding model on two NVIDIA A100 80G. I check the bug report and find this:
Source code:
config=GraphRAG.Config(
llm_service=OpenAILLMService(
model="qwen2.5_32b_instruct", base_url="http://127.0.0.1:9997/v1",
api_key="EMPTY", mode=instructor.Mode.JSON
), ······
Error:
2024-12-17 09:28:29,724 xinference.api.restful_api 184858 ERROR Handling request http://127.0.0.1:9997/v1/chat/completions
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateChatCompletion
response_format
extra fields not permitted (type=value_error.extra)

It seems that the xinference meets a bug in create_chat_completion:
body = CreateChatCompletion.parse_obj(raw_body)
and raises a validation error. This may because that "OpenAILLMService" posts an extra field "response format" in the request body .

I don't know if it shares the same problem with this issue, but I found it quite similar, hope you can help with it.

Update: My problem has been solved. I found that the field "response format" is commented out in the Xinference source code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

采用vllm部署大模型，使用xinference部署的embedding，集成在代码中，出现了问题 #29

采用vllm部署大模型，使用xinference部署的embedding，集成在代码中，出现了问题 #29

Joker-sad commented Nov 22, 2024

liukidar commented Nov 22, 2024

Joker-sad commented Nov 25, 2024

WooyoohL commented Dec 17, 2024 •

edited

Loading

采用vllm部署大模型，使用xinference部署的embedding，集成在代码中，出现了问题 #29

采用vllm部署大模型，使用xinference部署的embedding，集成在代码中，出现了问题 #29

Comments

Joker-sad commented Nov 22, 2024

设置工作目录

初始化 GraphRAG，去掉不支持的 timeout 参数

查询示例

liukidar commented Nov 22, 2024

Joker-sad commented Nov 25, 2024

WooyoohL commented Dec 17, 2024 • edited Loading

WooyoohL commented Dec 17, 2024 •

edited

Loading