Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I use a knowledge base to parse documents into QA data. Why can't I use it #11802

Closed
5 tasks done
ouguofeng opened this issue Dec 18, 2024 · 1 comment
Closed
5 tasks done
Labels

Comments

@ouguofeng
Copy link

ouguofeng commented Dec 18, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.13.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

1734516547073

I use qwen-plus, the Embedding model is text-embedding v3, and the Rerank model is gte-rerank

Failed to address: /console/API/datasets/indexing - estimate

Parameter: {
"info_list": {
"data_source_type": "upload_file",
"file_info_list": {
"file_ids": [
"b5447ef1-f812-44e4-a7e0-eb4ce421c965"
]
}
},
"indexing_technique": "high_quality",
"process_rule": {
"rules": {},
"mode": "automatic"
},
"doc_form": "qa_model",
"doc_language": "Chinese",
"dataset_id": "fb498a02-a7d1-446e-bd5d-c4b041ff5599"
}

Result: {
    "code": "indexing_estimate_error",
    "message": "[tongyi] Error: 3 validation errors for AssistantPromptMessage\ncontent.str\n  Input should be a valid string [type=string_type, input_value=[{'text': 'Q1: 贵公司...2020年06月08日。'}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.9/v/string_type\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.type\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.data\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing",
    "status": 500
}

✔️ Expected Behavior

I can't use QA parsing, I want to use it to parse my document

❌ Actual Behavior

No response

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 18, 2024
Copy link

dosubot bot commented Dec 18, 2024

I found a similar unsolved discussion that might be relevant to your issue: Error: Response output is missing or does not contain embeddings. [1]. This discussion might provide some insights or potential solutions related to validation errors and embedding issues.

Regarding your specific issue, the error message indicates that there are validation errors related to the AssistantPromptMessage model. The content field is expected to be a valid string or a list of PromptMessageContent, but it seems to be receiving a list with a dictionary containing text, which is causing the validation to fail. Ensure that the content field is correctly formatted as a string or a list of PromptMessageContent objects, which can include TextPromptMessageContent, ImagePromptMessageContent, or AudioPromptMessageContent [2].

Additionally, check that all required fields are provided and correctly formatted in your input data to avoid missing field errors.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 20, 2025
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2025
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants