Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the output, the order of 'ta' is sometimes reversed as 'at'. #8015

Closed
5 tasks done
wsbao opened this issue Sep 5, 2024 · 4 comments · Fixed by #8791
Closed
5 tasks done

In the output, the order of 'ta' is sometimes reversed as 'at'. #8015

wsbao opened this issue Sep 5, 2024 · 4 comments · Fixed by #8791
Labels
🐞 bug Something isn't working

Comments

@wsbao
Copy link
Contributor

wsbao commented Sep 5, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.7.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  • Use an open-source model like Qwen2-70B-Instruct
  • Apply agent framework in ReAct mode
    Then, in the output, the order of 'ta' is sometimes reversed as 'at'.
    image
    image

✔️ Expected Behavior

Correct order of the output alpabets, especially for 'a' and 't'.

❌ Actual Behavior

The order of 'ta' is sometimes reversed as 'at'.
image
image

Copy link

dosubot bot commented Sep 5, 2024

Hello @wsbao! I'm here to help you with any bugs, questions, or contributions you have for the Dify project. Let's work together to solve this issue!

There are no specific tests or validation checks documented in the repository for ensuring the correct order of characters in generated text [1]. Additionally, there are no known issues or bugs related to character ordering in the text generation module documented in the repository [1].

The issue might be related to the model or the way the agent framework in ReAct mode processes and generates responses. The CotAgentRunner class in the api/core/agent/cot_agent_runner.py file handles the response generation in ReAct mode [1]. You might want to investigate this part of the code to understand if there are any potential issues with character ordering during the response generation process.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Sep 5, 2024
@wsbao
Copy link
Contributor Author

wsbao commented Sep 17, 2024

@dosu, I created a YAML DSL for this App to reproduce this bug. I used LM Studio to host Qwen2-7B-Instruct-q2_k.gguf locally with CPU only.

app:
  description: ''
  icon: speaking_head_in_silhouette
  icon_background: '#E0EAFF'
  mode: agent-chat
  name: Gossip
  use_icon_as_answer_icon: false
kind: app
model_config:
  agent_mode:
    enabled: true
    max_iteration: 5
    prompt: null
    strategy: react
    tools: []
  annotation_reply:
    enabled: false
  chat_prompt_config: {}
  completion_prompt_config: {}
  dataset_configs:
    datasets:
      datasets: []
    retrieval_model: multiple
  dataset_query_variable: ''
  external_data_tools: []
  file_upload:
    image:
      detail: high
      enabled: false
      number_limits: 3
      transfer_methods:
      - remote_url
      - local_file
  model:
    completion_params:
      max_tokens: 4096
      stop: []
      temperature: 0
    mode: chat
    name: Qwen2-7B-Instruct-q2_k.gguf
    provider: openai_api_compatible
  more_like_this:
    enabled: false
  opening_statement: ''
  pre_prompt: ''
  prompt_type: simple
  retriever_resource:
    enabled: true
  sensitive_word_avoidance:
    configs: []
    enabled: false
    type: ''
  speech_to_text:
    enabled: false
  suggested_questions: []
  suggested_questions_after_answer:
    enabled: false
  text_to_speech:
    enabled: false
    language: ''
    voice: ''
  user_input_form: []
version: 0.1.2

When we ask "What are ISO standard and stable diffusion?", the "rather" in the front-end output is mis-spelt as "athn".

image

However, if we use same JSON payload

{
    "model": "Qwen2-7B-Instruct-q2_k.gguf",
    // "stream": true,
    "temperature": 0,
    "max_tokens": 4096,
    "messages": [
        {
            "role": "system",
            "content": "Respond to the human as helpfully and accurately as possible. \n\n\n\nYou have access to the following tools:\n\n[]\n\nUse a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).\nValid \"action\" values: \"Final Answer\" or \n\nProvide only ONE action per $JSON_BLOB, as shown:\n\n```\n{\n  \"action\": $TOOL_NAME,\n  \"action_input\": $ACTION_INPUT\n}\n```\n\nFollow this format:\n\nQuestion: input question to answer\nThought: consider previous and subsequent steps\nAction:\n```\n$JSON_BLOB\n```\nObservation: action result\n... (repeat Thought/Action/Observation N times)\nThought: I know what to respond\nAction:\n```\n{\n  \"action\": \"Final Answer\",\n  \"action_input\": \"Final response to human\"\n}\n```\n\nBegin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.\n"
        },
        {
            "role": "user",
            "content": "What are ISO standard and stable diffusion?"
        }
    ],
    "stop": [
        "Observation"
    ]
}

to query this model API, and the result is correct:

{
    "id": "chatcmpl-***h",
    "object": "chat.completion",
    "created": 17***01,
    "model": "Qwen2-7B-Instruct/Repository/qwen2-7b-instruct-q2_k.gguf",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Thought: The user wants information about ISO standards and stable diffusion processes separately rather than asking for their relationship or comparison.\n\nAction:\n```\n{\n  \"action\": \"Final Answer\",\n  \"action_input\": {\n    \"ISO Standards\": \"ISO standards refer to International Organization for Standardization guidelines that provide specifications for products, services, systems, processes, etc., ensuring quality assurance.\",\n    \"Stable Diffusion Processes\": \"Stable diffusion refers to the process where substances move from areas of higher concentration to lower concentration until equilibrium is reached.\"\n}\n}\n```\n\n"
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 242,
        "completion_tokens": 112,
        "total_tokens": 354
    },
    "system_fingerprint": "Qwen2-7B-Instruct/***/qwen2-7b-instruct-q2_k.gguf"
}

@wsbao
Copy link
Contributor Author

wsbao commented Sep 25, 2024

@Yeuoly @JohnJyong @GarfieldDai

I think I figured out the root cause of this bug. It arises from api/core/agent/output_parser/cot_output_parser.py which is called by api/core/agent/cot_agent_runner.py at line 123. Basically, the code snippet api/core/agent/output_parser/cot_output_parser.py transforms the output from LLM by:

  • removing "action:", "thought:";
  • parsing the JSON enclosed in the 3 back ticks.

In the implementation, variables action_cache and thought_cache are used to respectively store the prefixes of "action:" and "thought:" from the stream. Once the entire "action:" or "thought:" happens, it will be erased from the stream otherwise the content in action_cache and thought_cache should be put back in the stream. This bug is caused by the false order of putting current letter and historical cache (i.e., action_cache and thought_cache) back to the stream. Here is a concrete example to demonstrate this: given "than" is the word in the output stream from LLM, the code snippet api/core/agent/output_parser/cot_output_parser.py will store "th" in the thought_cache, and then "a" as the prefix of "action" can not form a standalone "action:" since it is the successor of the letter "h". By the execution order of this code, "a" will be yielded to stream before the thought_cache is released back to the stream, and this results in the erroneous display of "athn" other than "than" at the front end (see the screenshot in the comment above).

To rectify this undesired behavior, it is always good to release all the historically cached content back to the stream before the current letter stored in delta is to be yielded. However, after many tests, I also noticed another flaw in the code may lead to the same erroneous output. The flaw is that last_character cannot always track the predecessor of the current letter correctly. Again, let us study another concrete example: given "th", "an" (which are the splits of "than") are sequentially streamed from LLM, "th" will be again stored in thought_cache, and then "a" has empty last_character (NOT "h") this time and will be stored in action_cache. After that, "n" will be processed, and it make action_cache and thought_cache release the stored content back to the stream. In the code, action_cache will always be yielded before thought_cache, and again "athn" will erroneously displayed.

With these observations, I suggest two improvements for api/core/agent/output_parser/cot_output_parser.py:

  • Whenever the current letter in delta is to be yielded, do release all cached content beforehand;
  • Store the correct predecessor in last_character for the current letter in delta, always update last_character with delta whenever any variable is to be yielded or to be appended to *_cache variable.

Below is my proposed revision for api/core/agent/output_parser/cot_output_parser.py

import json
import re
from collections.abc import Generator
from typing import Union

from core.agent.entities import AgentScratchpadUnit
from core.model_runtime.entities.llm_entities import LLMResultChunk


class CotAgentOutputParser:
    @classmethod
    def handle_react_stream_output(
        cls, llm_response: Generator[LLMResultChunk, None, None], usage_dict: dict
    ) -> Generator[Union[str, AgentScratchpadUnit.Action], None, None]:
        def parse_action(json_str):
            try:
                action = json.loads(json_str)
                action_name = None
                action_input = None

                # cohere always returns a list
                if isinstance(action, list) and len(action) == 1:
                    action = action[0]

                for key, value in action.items():
                    if "input" in key.lower():
                        action_input = value
                    else:
                        action_name = value

                if action_name is not None and action_input is not None:
                    return AgentScratchpadUnit.Action(
                        action_name=action_name,
                        action_input=action_input,
                    )
                else:
                    return json_str or ""
            except:
                return json_str or ""

        def extra_json_from_code_block(code_block) -> Generator[Union[dict, str], None, None]:
            code_blocks = re.findall(r"```(.*?)```", code_block, re.DOTALL)
            if not code_blocks:
                return
            for block in code_blocks:
                json_text = re.sub(r"^[a-zA-Z]+\n", "", block.strip(), flags=re.MULTILINE)
                yield parse_action(json_text)

        code_block_cache = ""
        code_block_delimiter_count = 0
        in_code_block = False
        json_cache = ""
        json_quote_count = 0
        in_json = False
        got_json = False

        action_cache = ""
        action_str = "action:"
        action_idx = 0

        thought_cache = ""
        thought_str = "thought:"
        thought_idx = 0
        
        last_character = ""

        for response in llm_response:
            if response.delta.usage:
                usage_dict["usage"] = response.delta.usage
            response = response.delta.message.content
            if not isinstance(response, str):
                continue

            # stream
            index = 0
            while index < len(response):
                steps = 1
                delta = response[index : index + steps]
                # last_character = response[index - 1] if index > 0 else ""
                yield_delta = False

                if delta == "`":
                    last_character = delta
                    code_block_cache += delta
                    code_block_delimiter_count += 1
                else:
                    if not in_code_block:
                        if code_block_delimiter_count > 0:
                            last_character = delta
                            yield code_block_cache
                        code_block_cache = ""
                    else:
                        last_character = delta
                        code_block_cache += delta
                    code_block_delimiter_count = 0

                if not in_code_block and not in_json:
                    if delta.lower() == action_str[action_idx] and action_idx == 0:
                        if last_character not in {"\n", " ", ""}:
                            yield_delta = True
                            # index += steps
                            # yield delta
                            # continue
                        else:
                            last_character = delta
                            action_cache += delta
                            action_idx += 1
                            if action_idx == len(action_str):
                                action_cache = ""
                                action_idx = 0
                            index += steps
                            continue
                    elif delta.lower() == action_str[action_idx] and action_idx > 0:
                        last_character = delta
                        action_cache += delta
                        action_idx += 1
                        if action_idx == len(action_str):
                            action_cache = ""
                            action_idx = 0
                        index += steps
                        continue
                    else:
                        if action_cache:
                            last_character = delta
                            yield action_cache
                            action_cache = ""
                            action_idx = 0

                    if delta.lower() == thought_str[thought_idx] and thought_idx == 0:
                        if last_character not in {"\n", " ", ""}:
                            yield_delta = True
                            # index += steps
                            # yield delta
                            # continue
                        else:
                            last_character = delta
                            thought_cache += delta
                            thought_idx += 1
                            if thought_idx == len(thought_str):
                                thought_cache = ""
                                thought_idx = 0
                            index += steps
                            continue
                    elif delta.lower() == thought_str[thought_idx] and thought_idx > 0:
                        last_character = delta
                        thought_cache += delta
                        thought_idx += 1
                        if thought_idx == len(thought_str):
                            thought_cache = ""
                            thought_idx = 0
                        index += steps
                        continue
                    else:
                        if thought_cache:
                            last_character = delta
                            yield thought_cache
                            thought_cache = ""
                            thought_idx = 0
                    
                    if yield_delta:
                        index += steps
                        last_character = delta
                        yield delta
                        continue

                if code_block_delimiter_count == 3:
                    if in_code_block:
                        last_character = delta
                        yield from extra_json_from_code_block(code_block_cache)
                        code_block_cache = ""

                    in_code_block = not in_code_block
                    code_block_delimiter_count = 0

                if not in_code_block:
                    # handle single json
                    if delta == "{":
                        json_quote_count += 1
                        in_json = True
                        last_character = delta
                        json_cache += delta
                    elif delta == "}":
                        last_character = delta
                        json_cache += delta
                        if json_quote_count > 0:
                            json_quote_count -= 1
                            if json_quote_count == 0:
                                in_json = False
                                got_json = True
                                index += steps
                                continue
                    else:
                        if in_json:
                            last_character = delta
                            json_cache += delta

                    if got_json:
                        got_json = False
                        last_character = delta
                        yield parse_action(json_cache)
                        json_cache = ""
                        json_quote_count = 0
                        in_json = False

                if not in_code_block and not in_json:
                    last_character = delta
                    yield delta.replace("`", "")

                index += steps

        if code_block_cache:
            yield code_block_cache

        if json_cache:
            yield parse_action(json_cache)

Copy link

dosubot bot commented Sep 25, 2024

The issue you're experiencing with character ordering in the output seems to be specific to the ReAct mode in the Dify platform when using the Qwen2-7B-Instruct model. Here are a few steps you can take to investigate and potentially resolve this issue:

  1. Check the ReAct Mode Implementation: The ReAct mode processes and generates responses by leveraging the ReAct inference framework. This framework allows the agent to complete tasks through reasoning, step decomposition, tool invocation, and process iteration [1]. Ensure that the implementation of the ReAct mode in your YAML configuration is correct and aligns with the expected behavior.

  2. Review the CotAgentRunner Class: The CotAgentRunner class in the api/core/agent/cot_agent_runner.py file is responsible for running the Chain-of-Thought (CoT) agent application. It manages the agent's state, processes inputs, invokes tools, and handles the generation of responses from a language model [2]. Investigate this part of the code to see if there are any potential issues with character ordering during the response generation process.

  3. Model Configuration: Ensure that the model configuration in your YAML DSL is correctly set up. The max_tokens parameter is set to 4096, which is within the acceptable range for the Qwen2-7B-Instruct model. However, you might want to experiment with different values for temperature and other parameters to see if they affect the output.

  4. Compare API and Front-End Outputs: As you mentioned, the JSON payload query to the model API returns the correct result, while the front-end output has the character ordering issue. This discrepancy suggests that the issue might be related to how the front-end processes and displays the response. Investigate the front-end code to ensure that it correctly handles and displays the response from the model.

  5. Debugging and Logging: Add debugging and logging statements in the relevant parts of the code to trace the response generation process. This can help identify where the character ordering issue occurs.

By following these steps, you should be able to narrow down the cause of the character ordering issue and implement a fix. If the problem persists, consider reaching out to the Dify community for further assistance.

To continue talking to Dosu, mention @dosu.

wsbao added a commit to wsbao/dify that referenced this issue Sep 26, 2024
…anggenius#8015, removed unused code, and installed ruff and dotenv-linter to run dev/reformat again.
ZhouhaoJiang added a commit that referenced this issue Oct 17, 2024
* main: (35 commits)
  fix #9409 (#9433)
  update dataset clean rule (#9426)
  add clean 7 days datasets (#9424)
  fix: resolve overlap issue with API Extension selector and modal (#9407)
  refactor: update the default values of top-k parameter in vdb to be consistent (#9367)
  fix: incorrect webapp image displayed (#9401)
  Fix/economical knowledge retrieval (#9396)
  feat: add timezone conversion for time tool (#9393)
  fix: Deprecated gemma2-9b model in Fireworks AI Provider (#9373)
  feat: storybook (#9324)
  fix: use gpt-4o-mini for validating credentials (#9387)
  feat: Enable baiduvector intergration test (#9369)
  fix: remove the stream option of zhipu and gemini (#9319)
  fix: add missing vikingdb param in docker .env.example (#9334)
  feat: add minimax abab6.5t support (#9365)
  fix: (#9336 followup) skip poetry preperation in style workflow when no change in api folder (#9362)
  feat: add glm-4-flashx, deprecated chatglm_turbo (#9357)
  fix: Azure OpenAI o1 max_completion_token and get_num_token_from_messages error (#9326)
  fix: In the output, the order of 'ta' is sometimes reversed as 'at'. #8015 (#8791)
  refactor: Add an enumeration type and use the factory pattern to obtain the corresponding class (#9356)
  ...
ZhouhaoJiang added a commit that referenced this issue Oct 17, 2024
* feat/new-login: (30 commits)
  feat: add init login type
  fix #9409 (#9433)
  update dataset clean rule (#9426)
  add clean 7 days datasets (#9424)
  fix: resolve overlap issue with API Extension selector and modal (#9407)
  refactor: update the default values of top-k parameter in vdb to be consistent (#9367)
  fix: incorrect webapp image displayed (#9401)
  Fix/economical knowledge retrieval (#9396)
  feat: add timezone conversion for time tool (#9393)
  fix: Deprecated gemma2-9b model in Fireworks AI Provider (#9373)
  feat: storybook (#9324)
  fix: use gpt-4o-mini for validating credentials (#9387)
  feat: Enable baiduvector intergration test (#9369)
  fix: remove the stream option of zhipu and gemini (#9319)
  fix: add missing vikingdb param in docker .env.example (#9334)
  feat: add minimax abab6.5t support (#9365)
  fix: (#9336 followup) skip poetry preperation in style workflow when no change in api folder (#9362)
  feat: add glm-4-flashx, deprecated chatglm_turbo (#9357)
  fix: Azure OpenAI o1 max_completion_token and get_num_token_from_messages error (#9326)
  fix: In the output, the order of 'ta' is sometimes reversed as 'at'. #8015 (#8791)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant