Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

采用vllm部署大模型,使用xinference部署的embedding,集成在代码中,出现了问题 #29

Open
Joker-sad opened this issue Nov 22, 2024 · 3 comments

Comments

@Joker-sad
Copy link

from fast_graphrag import GraphRAG
from fast_graphrag._llm import OpenAIEmbeddingService, OpenAILLMService

from typing import List

DOMAIN = "Novel Analysis"
QUERIES: List[str] = [
"Who are the main characters in this novel?",
"What locations are featured in the story?",
"What are the relationships between the characters?",
]
ENTITY_TYPES: List[str] = ["Character", "Location", "Event"]

设置工作目录

working_dir = "./examples/ignore/hp"

初始化 GraphRAG,去掉不支持的 timeout 参数

grag = GraphRAG(
working_dir=working_dir,
domain=DOMAIN,
example_queries="\n".join(QUERIES),
entity_types=ENTITY_TYPES,
config=GraphRAG.Config(
llm_service=OpenAILLMService(
model="XXXXX/Qwen2-7B-Instruct-GPTQ-Int4",
base_url="http://XXXX:XXXX/v1/",
api_key="token-12",
),
embedding_service=OpenAIEmbeddingService(
model="bge-base-zh",
base_url="http://XXXXX:XXXX/v1/",
api_key="token-12",
embedding_dim=512,),
),
)

try:
with open("./book.txt", "r", encoding="utf-8") as f:
story_content = f.read()
grag.insert(story_content)
except Exception as e:
print(f"插入文本时发生错误: {e}")

查询示例

try:
response = grag.query("这本小说的主要人物有哪些?").response
print(response)
except Exception as e:
print(f"查询时发生错误: {e}")

报错:Extracting data: 0%| | 0/1 [00:00<?, ?it/sError during information extraction from document: Request timed out.
Extracting data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [27:13<00:00, 1633.24s/it]
Error during query: Request timed out.
查询时发生错误: Request timed out.

问题是请求时间过程,但是我绝对实际的问题可能不是,总觉得是不是我部署的模型不能够在代码中使用

@liukidar
Copy link
Contributor

Hello, it looks like there are some problems with the language model you're using as it is not able to provide an answer. What service are you using?

@Joker-sad
Copy link
Author

您好,您使用的语言模型似乎存在一些问题,因为它无法提供答案。您使用的是什么服务?
您好,我是用的模型是qwen2.5-chat 使用vllm来部署的

@WooyoohL
Copy link

WooyoohL commented Dec 17, 2024

I think I meet the similar problem in my project. Specifically, I use the xinference and deploy the qwen2.5-32b-instruct llm model and bge-1.5-large embedding model on two NVIDIA A100 80G. I check the bug report and find this:
Source code:
config=GraphRAG.Config(
llm_service=OpenAILLMService(
model="qwen2.5_32b_instruct", base_url="http://127.0.0.1:9997/v1",
api_key="EMPTY", mode=instructor.Mode.JSON
), ······
Error:
2024-12-17 09:28:29,724 xinference.api.restful_api 184858 ERROR Handling request http://127.0.0.1:9997/v1/chat/completions
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateChatCompletion
response_format
extra fields not permitted (type=value_error.extra)

It seems that the xinference meets a bug in create_chat_completion:
body = CreateChatCompletion.parse_obj(raw_body)
and raises a validation error. This may because that "OpenAILLMService" posts an extra field "response format" in the request body .

I don't know if it shares the same problem with this issue, but I found it quite similar, hope you can help with it.

Update: My problem has been solved. I found that the field "response format" is commented out in the Xinference source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants