Skip to content

[BUG] LLM Fails to Process Binary Image Data from Tool Output #800

@bitkira

Description

@bitkira

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.7.0

Python Version

3.12.9

Operating System

macOS 15.4

Installation Method

pip

Steps to Reproduce

1.Define a tool that reads an image file and returns its content in a dictionary, including the binary data and metadata.

`from strands import tool
import os
from PIL import Image
import base64

Assuming the framework can handle complex dictionary return types

@tool
def Image_Reader(image_path: str) -> dict:
"""
Reads an image file from a given path and returns it in a structured format.

Args:
    image_path (str): The path to the image file. Can be absolute or user-relative (with ~/).
"""
try:
    expanded_path = os.path.expanduser(image_path)

    if not os.path.exists(expanded_path):
        raise FileNotFoundError(f"File not found at path: {expanded_path}")

    with open(expanded_path, "rb") as file:
        file_bytes = file.read()

    with Image.open(expanded_path) as img:
        image_format = img.format.lower()
        if image_format not in ["png", "jpeg", "jpg", "gif", "webp"]:
            image_format = "png" # Default format

    return {
        "status": "success",
        "content": [
            {
                "image": {
                    "format": image_format,
                    "source": {"bytes": file_bytes}
                }
            }
        ]
    }

except Exception as e:
    raise e`

2.Have the LLM call this Image_Reader tool with a valid path to an image.
3.The tool executes successfully and returns the dictionary containing the image bytes.
4.The LLM receives this tool output but fails to "see" or process the image content within it.

Expected Behavior

The LLM should be able to read the image data from the tool's return value, understand its content, and answer subsequent questions about the image.

Actual Behavior

The LLM is unable to interpret the image provided in the tool's output. It responds as if no image data was supplied.

Image

Additional Context

I have verified that if the same image's binary data is formatted and placed directly within a message, the LLM can read and understand the image perfectly.

Image

Possible Solution

Perhaps returning raw binary data within a dictionary is not the intended way for an LLM to autonomously process images via tools. Is there a different or recommended method for a tool to return an image that the LLM can natively read and analyze? Any guidance on best practices for this would be greatly appreciated.

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions