I use a knowledge base to parse documents into QA data. Why can't I use it #11802

ouguofeng · 2024-12-18T10:15:13Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.13.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I use qwen-plus, the Embedding model is text-embedding v3, and the Rerank model is gte-rerank

Failed to address: /console/API/datasets/indexing - estimate

Parameter: {
"info_list": {
"data_source_type": "upload_file",
"file_info_list": {
"file_ids": [
"b5447ef1-f812-44e4-a7e0-eb4ce421c965"
]
}
},
"indexing_technique": "high_quality",
"process_rule": {
"rules": {},
"mode": "automatic"
},
"doc_form": "qa_model",
"doc_language": "Chinese",
"dataset_id": "fb498a02-a7d1-446e-bd5d-c4b041ff5599"
}

Result: {
    "code": "indexing_estimate_error",
    "message": "[tongyi] Error: 3 validation errors for AssistantPromptMessage\ncontent.str\n  Input should be a valid string [type=string_type, input_value=[{'text': 'Q1: 贵公司...2020年06月08日。'}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.9/v/string_type\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.type\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.data\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing",
    "status": 500
}

✔️ Expected Behavior

I can't use QA parsing, I want to use it to parse my document

❌ Actual Behavior

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-12-18T10:18:06Z

I found a similar unsolved discussion that might be relevant to your issue: Error: Response output is missing or does not contain embeddings. ^[1]. This discussion might provide some insights or potential solutions related to validation errors and embedding issues.

Regarding your specific issue, the error message indicates that there are validation errors related to the AssistantPromptMessage model. The content field is expected to be a valid string or a list of PromptMessageContent, but it seems to be receiving a list with a dictionary containing text, which is causing the validation to fail. Ensure that the content field is correctly formatted as a string or a list of PromptMessageContent objects, which can include TextPromptMessageContent, ImagePromptMessageContent, or AudioPromptMessageContent ^[2].

Additionally, check that all required fields are provided and correctly formatted in your input data to avoid missing field errors.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

dosubot bot added the 🐞 bug Something isn't working label Dec 18, 2024

crazywoola added the ⚙️ feat:model-runtime label Dec 18, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 20, 2025

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2025

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I use a knowledge base to parse documents into QA data. Why can't I use it #11802

I use a knowledge base to parse documents into QA data. Why can't I use it #11802

ouguofeng commented Dec 18, 2024 •

edited by crazywoola

Loading

dosubot bot commented Dec 18, 2024

I use a knowledge base to parse documents into QA data. Why can't I use it #11802

I use a knowledge base to parse documents into QA data. Why can't I use it #11802

Comments

ouguofeng commented Dec 18, 2024 • edited by crazywoola Loading

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Dec 18, 2024

ouguofeng commented Dec 18, 2024 •

edited by crazywoola

Loading