-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
采用vllm部署大模型,使用xinference部署的embedding,集成在代码中,出现了问题 #29
Comments
Hello, it looks like there are some problems with the language model you're using as it is not able to provide an answer. What service are you using? |
|
I think I meet the similar problem in my project. Specifically, I use the xinference and deploy the qwen2.5-32b-instruct llm model and bge-1.5-large embedding model on two NVIDIA A100 80G. I check the bug report and find this: It seems that the xinference meets a bug in create_chat_completion: I don't know if it shares the same problem with this issue, but I found it quite similar, hope you can help with it. Update: My problem has been solved. I found that the field "response format" is commented out in the Xinference source code. |
from fast_graphrag import GraphRAG
from fast_graphrag._llm import OpenAIEmbeddingService, OpenAILLMService
from typing import List
DOMAIN = "Novel Analysis"
QUERIES: List[str] = [
"Who are the main characters in this novel?",
"What locations are featured in the story?",
"What are the relationships between the characters?",
]
ENTITY_TYPES: List[str] = ["Character", "Location", "Event"]
设置工作目录
working_dir = "./examples/ignore/hp"
初始化 GraphRAG,去掉不支持的 timeout 参数
grag = GraphRAG(
working_dir=working_dir,
domain=DOMAIN,
example_queries="\n".join(QUERIES),
entity_types=ENTITY_TYPES,
config=GraphRAG.Config(
llm_service=OpenAILLMService(
model="XXXXX/Qwen2-7B-Instruct-GPTQ-Int4",
base_url="http://XXXX:XXXX/v1/",
api_key="token-12",
),
embedding_service=OpenAIEmbeddingService(
model="bge-base-zh",
base_url="http://XXXXX:XXXX/v1/",
api_key="token-12",
embedding_dim=512,),
),
)
try:
with open("./book.txt", "r", encoding="utf-8") as f:
story_content = f.read()
grag.insert(story_content)
except Exception as e:
print(f"插入文本时发生错误: {e}")
查询示例
try:
response = grag.query("这本小说的主要人物有哪些?").response
print(response)
except Exception as e:
print(f"查询时发生错误: {e}")
报错:Extracting data: 0%| | 0/1 [00:00<?, ?it/sError during information extraction from document: Request timed out.
Extracting data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [27:13<00:00, 1633.24s/it]
Error during query: Request timed out.
查询时发生错误: Request timed out.
问题是请求时间过程,但是我绝对实际的问题可能不是,总觉得是不是我部署的模型不能够在代码中使用
The text was updated successfully, but these errors were encountered: