-
Notifications
You must be signed in to change notification settings - Fork 449
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.7.0
Python Version
3.12.9
Operating System
macOS 15.4
Installation Method
pip
Steps to Reproduce
1.Define a tool that reads an image file and returns its content in a dictionary, including the binary data and metadata.
`from strands import tool
import os
from PIL import Image
import base64
Assuming the framework can handle complex dictionary return types
@tool
def Image_Reader(image_path: str) -> dict:
"""
Reads an image file from a given path and returns it in a structured format.
Args:
image_path (str): The path to the image file. Can be absolute or user-relative (with ~/).
"""
try:
expanded_path = os.path.expanduser(image_path)
if not os.path.exists(expanded_path):
raise FileNotFoundError(f"File not found at path: {expanded_path}")
with open(expanded_path, "rb") as file:
file_bytes = file.read()
with Image.open(expanded_path) as img:
image_format = img.format.lower()
if image_format not in ["png", "jpeg", "jpg", "gif", "webp"]:
image_format = "png" # Default format
return {
"status": "success",
"content": [
{
"image": {
"format": image_format,
"source": {"bytes": file_bytes}
}
}
]
}
except Exception as e:
raise e`
2.Have the LLM call this Image_Reader tool with a valid path to an image.
3.The tool executes successfully and returns the dictionary containing the image bytes.
4.The LLM receives this tool output but fails to "see" or process the image content within it.
Expected Behavior
The LLM should be able to read the image data from the tool's return value, understand its content, and answer subsequent questions about the image.
Actual Behavior
The LLM is unable to interpret the image provided in the tool's output. It responds as if no image data was supplied.
Additional Context
I have verified that if the same image's binary data is formatted and placed directly within a message, the LLM can read and understand the image perfectly.
Possible Solution
Perhaps returning raw binary data within a dictionary is not the intended way for an LLM to autonomously process images via tools. Is there a different or recommended method for a tool to return an image that the LLM can natively read and analyze? Any guidance on best practices for this would be greatly appreciated.
Related Issues
No response