Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 30 additions & 17 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,20 @@ Prioritize critical thinking, thorough verification, and evidence-driven changes
You are a guardian of this codebase. Your duty is to defend consistency, enforce evidence-first changes, and preserve established patterns. Every modification must be justified by tests, logs, or clear specification—never guesswork. Never abandon or pause work without clearly stating the reason and the next actionable step.

Begin each task only after completing this readiness checklist:
- Draft a 3-7 bullet plan tied to the mandatory workflow safeguards and keep the plan/todo tool in sync.
- When the work needs more than a single straightforward action, draft a 3-7 bullet plan tied to the mandatory workflow safeguards and keep the plan/todo tool in sync; skip the plan step for one-off commands.
- Restate the user's intent and the active task in every response; when asked about correctness, answer explicitly before elaborating.
- Prime yourself with all available context—read, trace, and analyze until additional context produces diminishing returns.
- Prime yourself with all available context—read, trace, and analyze until additional context produces diminishing returns, and do not proceed unless you can explain every change in your own words.
- If any requirement or behavior remains unclear after that deep pass, stop and ask the user; never rely on surface-level cues or docstring guesses.
- Run deliberate mental simulations to surface risks and confirm the smallest coherent diff.
- Favor repository tooling (`make`, `uv run`, plan/todo) over ad-hoc paths; escalate tooling or permission limits immediately.
- Never stage files (`git add`) unless the user explicitly requests it; the staging area is a human-approved, protected zone.
- Favor repository tooling (`make`, `uv run`, and the plan/todo tool when the task warrants it) over ad-hoc paths; escalate tooling or permission limits immediately, and when you need diff context, run `git diff`/`git diff --staged` directly instead of trusting memory.
- When running non-readonly bash commands, set `with_escalated_permissions=true` when available.
- Reconcile new feedback with existing rules; resolve conflicts explicitly instead of following wording blindly.
- Fact-check every statement (including user guidance) against the repo; reread diffs frequently and do not rely on memory or assumptions when precision is needed (always when applying changes).
- Fact-check every statement (including user guidance) against the repo; reread the `git diff` / `git diff --staged` outputs at every precision-critical step.

## 🔴 TESTS DEFINE TRUTH
## 🔴 TESTS & DOCS DEFINE TRUTH

Default to test-driven development. Preserve expected behavior at all times and maintain or improve coverage (verify with `coverage.xml`). Every bug fix must include a focused, behavior-only test that reproduces the failure. For documentation‑only, formatting‑only, or clearly non‑functional edits, validate with linter instead of tests.
Default to test-driven development. Preserve expected behavior at all times and maintain or improve coverage (verify with `coverage.xml`). Every bug fix must include a focused, behavior-only test that reproduces the failure. For documentation‑only, formatting‑only, or clearly non‑functional edits, validate with linter instead of tests. Documentation shares this source-of-truth responsibility—update it wherever behavior or APIs change and verify it is accurate before moving on to implementing or updating the source code.

## 🛡️ GUARDIANSHIP OF THE CODEBASE (HIGHEST PRIORITY)

Expand All @@ -41,15 +43,16 @@ These requirements apply to every file in the repository. Bullets prefixed with
- No duplicate information or code: within reason, keep the content dry and prefer using references instead of duplicating any idea or functionality.
- Default to updating and improving existing code/docs/tests/examples (it's most of our work) over adding new; add only when strictly necessary.
- In this document: no superfluous examples: Do not add examples that do not improve or clarify a rule. Omit examples when rules are self‑explanatory.
- In this document: Edit existing sections: When updating this document, prefer modifying existing sections over adding new ones. Add new sections only when strictly necessary to remove ambiguity.
- In this document: Edit existing sections after reading this file end-to-end so you catch and delete duplication; add new sections only when strictly necessary to remove ambiguity.
- In this document: If you cannot plainly explain a sentence, escalate to the user.
- Naming: Functions are verb phrases; values are noun phrases. Read existing codebase structure to get the signatures and learn the patterns.
- Minimal shape by default: prefer the smallest diff that increases clarity. Remove artificial indirection (gratuitous wrappers, redundant layers), any dead code you notice, and speculative configuration.
- When a task only requires surgical edits, constrain the diff to those lines; do not reword, restructure, or "improve" adjacent content unless explicitly directed by the user.
- Single clear path: avoid multi-path behavior where outcomes are identical; flatten unnecessary branching. Do not add optional fallbacks without explicit specification.

### Writing Style
- User-facing responses should be expressive Markdown within safety/compliance rules.
- Avoid unclear or unexplainable phrases. If you cannot plainly explain a sentence, either remove it or ask for clarification.
### Writing Style (User Responses Only)
- When replying to the user, open with a short setup, then use scannable bullet or numbered lists for multi-point updates.
- Ask concise clarifying questions as soon as any requirement is ambiguous so the user can correct course fast.

## 🔴 SAFETY PROTOCOLS

Expand All @@ -58,12 +61,15 @@ These requirements apply to every file in the repository. Bullets prefixed with
#### Step 0: Build Full Codebase Structure and Comprehensive Change Review
`make prime`

- This meta-command covers structure discovery and git status/diffs; avoid duplicating sub-command listings elsewhere to preserve context.
- Run this before reading or modifying files—no exceptions.
- Latest Diff First (non‑negotiable): Before starting any task, read the current staged and unstaged diffs and reconcile your plan to them. Do not proceed until you have incorporated the latest diff.
- Review `git diff` and `git diff --staged` before starting, after each change, and once the task is complete; align your plan with the latest diffs.
- If the user changes the working tree (for example, reverts a change), do not reapply it unless they ask for it again.
- Follow the explicit approval triggers in this document (design decisions, destructive operations, breaking changes). Do not invent extra approval gates that stall progress.
- Run `make prime` first, every time; it already covers structure discovery plus staged and unstaged diffs, so don't rewrite its sub-commands elsewhere.
- Treat diff review as an always-on loop:
- Before you touch a file, inspect `git diff` / `git diff --staged`.
- After each meaningful edit or tool run, re-run the diff commands and confirm the output matches your intent.
- Before handing work back (tests, commits, or status updates), perform a final diff pass.
- If the diff changes in a way you did not expect, stop and reconcile before proceeding.
- Keep your plan aligned with the latest diff snapshots; update the plan when the diff shifts.
- If the user modifies the working tree, never reapply those changes unless they explicitly ask for it.
- Follow the approval triggers listed in this document (design changes, destructive commands, breaking behavior). Do not add improvised gates that slow progress.

#### Step 1: Proactive Analysis
- Search for similar patterns; identify required related changes globally.
Expand All @@ -76,7 +82,7 @@ These requirements apply to every file in the repository. Bullets prefixed with
- Edit incrementally: make small, focused changes, validating each with tests before continuing.
- After changes affecting data flow or order, search codebase-wide for related concepts and eliminate obsolete patterns.
- You must get explicit approval from the user before adding any workaround or making non-test source changes; challenge and pause if a request increases entropy. Keep any diffs minimal (avoid excessive changes).
- Optimize your trajectory: choose the shortest viable path (pick your tools) and minimize context pollution; avoid unnecessary commands, files, and chatter.
- Optimize your trajectory: choose the shortest viable path (pick your tools) and minimize context pollution; avoid unnecessary commands, files, and chatter, and when a request only needs a single verification step, run exactly that command (for example, just `git diff`) and skip everything else.

#### Step 2: Comprehensive Validation
# Run only the relevant tests first (specific file/test)
Expand Down Expand Up @@ -115,6 +121,7 @@ After each tool call or code edit, validate the result in 1-2 lines and proceed

### Example Runs
- Run non-interactive examples from /examples directory. Never run examples/interactive/* as they require user input.
- MANDATORY: Run 100% of code you touch. If you modify an example, run it. If you modify a module, run its tests.

### Test Guidelines (Canonical)
- **Shared rules:**
Expand Down Expand Up @@ -155,6 +162,12 @@ Agency Swarm is a multi-agent orchestration framework built on the OpenAI Agents
### Documentation Rules
- All documentation writing and updates MUST follow `docs/mintlify.cursorrules` for formatting, components, links, and page metadata.
- Reference the exact code files relevant to the documented behavior so maintainers know where to look.
- Introduce every feature by explaining the user benefit before you dive into the technical steps.
- Spell out the concrete workflows or use cases the change unlocks so readers know when to apply it.
- Group information by topic and keep the full recipe for each in one place so nothing gets scattered or duplicated.
- Pull important notes or rules into dedicated callouts (e.g. <Note>) so they don't get lost in a paragraph.
- Avoid filler or repetition so every sentence advances understanding.
- Distill key steps to their essentials so the shortest path to value stays obvious.
- Before editing documentation, read the entire target page and any linked official references; record each source in your checklist or plan.
- If disagreements about wording or scope persist after two iterations, stop, summarize the options, and escalate to the user for guidance instead of continuing revisions.

Expand Down
206 changes: 206 additions & 0 deletions docs/core-framework/tools/custom-tools/multimodal-outputs.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
title: "Multimodal Tool Outputs"
description: "Return images and files from your tools."
icon: "image"
---

## What This Feature Unlocks

Returning images and files from the tools enables real agentic feedback loops on completely new modalities.

For example, instead of dumping all the data into an agent, and hoping for the best, you can generate a visualization or analyze PDF reports, and allow the agent to provide insights based on that output. Just like a real data analyst.

This saves your context window and unlocks autonomous agentic workflows for a lot of new use cases:

## New Use Cases

<CardGroup cols={2}>
<Card title="Software Development" icon="code">
Agents can check websites autonomously and iterate until all elements are properly positioned, enabling them to tackle complex projects without manual screenshot feedback.
</Card>
<Card title="Brand Asset Generation" icon="paintbrush">
Provide brand guidelines, logos, and messaging, then let agents iterate on image and video generation (including Sora 2) until outputs fully match your expectations.
</Card>
<Card title="Screen-Aware Assistance" icon="eye">
Build agents that help visually impaired individuals navigate websites or create customer support agents that see the user's current webpage for better assistance.
</Card>
<Card title="Data Analytics" icon="chart-area">
Generate visual graphs and analyze PDF reports, then let agents provide insights based on these outputs without overloading the context window.
</Card>
</CardGroup>

## Output Formats

### Images (PNG, JPG)

To return an image from a tool, you can either:

1. Use the `ToolOutputImage` class.
2. Return a dict with the `type` set to `"image"` and either `image_url` (URL or data URL) or `file_id`.
3. Use our convenience `tool_output_image_from_path` function.

```python
from agency_swarm import BaseTool, ToolOutputImage, ToolOutputImageDict
from agency_swarm.tools.utils import tool_output_image_from_path
from pydantic import Field

class FetchGalleryImage(BaseTool):
"""Return a static gallery image."""
detail: str = Field(default="auto", description="Level of detail")

def run(self) -> ToolOutputImage:
return ToolOutputImage(
image_url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
detail=self.detail,
)

class FetchGalleryImageDict(BaseTool):
"""Dict variant of the same image output."""
detail: str = Field(default="auto", description="Level of detail")

def run(self) -> ToolOutputImageDict:
return {
"type": "image",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
"detail": self.detail,
}

class FetchLocalImage(BaseTool):
"""Load an image from disk using the helper."""
path: str = Field(default="examples/data/landscape_scene.png", description="Image to publish")

def run(self) -> ToolOutputImage:
return tool_output_image_from_path(self.path, detail="auto")
```

### Files (PDF)

Similarly to return a file from a tool:

```python
from agency_swarm import BaseTool, ToolOutputFileContent
from agency_swarm.tools.utils import tool_output_file_from_path, tool_output_file_from_url
from pydantic import Field

class FetchReferenceReport(BaseTool):
"""Return a reference PDF hosted remotely."""
source_url: str = Field(
default="https://raw.githubusercontent.com/VRSEN/agency-swarm/main/examples/data/sample_report.pdf",
description="Remote file to share",
)

def run(self) -> ToolOutputFileContent:
return ToolOutputFileContent(file_url=self.source_url)

class FetchLocalReport(BaseTool):
"""Return a report stored on disk."""
path: str = Field(default="examples/data/sample_report.pdf", description="Local file path")

def run(self) -> ToolOutputFileContent:
return tool_output_file_from_path(self.path)

class FetchRemoteReport(BaseTool):
"""Return a remote file using the helper."""
archive_url: str = Field(default="https://example.com/document.pdf", description="File to expose")

def run(self) -> ToolOutputFileContent:
return tool_output_file_from_url(self.archive_url)
```

<Note>
When you choose `file_data`, include `filename` to hint a download name; URL-based outputs rely on the remote server metadata instead.
</Note>

<Warning>
`tool_output_file_from_path` only supports PDF files.
</Warning>


### Combining Multiple Outputs

Return multiple outputs by returning a list from `run`.

```python
from agency_swarm import BaseTool, ToolOutputFileContent, ToolOutputImage, ToolOutputText

class PrepareShowcase(BaseTool):
"""Return rich media and a short description."""
teaser_a: str = "https://example.com/teaser-a.png"
teaser_b: str = "https://example.com/teaser-b.png"
report_id: str = "file-report-123"

def run(self) -> list:
return [
ToolOutputImage(image_url=self.teaser_a),
ToolOutputImage(image_url=self.teaser_b),
ToolOutputText(text="Gallery updated: Teaser A and Teaser B now live."),
ToolOutputFileContent(file_id=self.report_id),
]
```

## Complete Example (Chart generation tool)

Here's a complete example using `BaseTool`:

```python
from agency_swarm import Agent, BaseTool, ToolOutputImage
from pydantic import Field
import base64
import matplotlib.pyplot as plt
import io

class GenerateChartTool(BaseTool):
"""Generate a bar chart from data."""

data: list[float] = Field(..., description="Data points for the chart")
labels: list[str] = Field(..., description="Labels for each data point")

def run(self) -> ToolOutputImage:
"""Generate and return the chart as a base64-encoded image."""
# Create the chart
fig, ax = plt.subplots()
ax.bar(self.labels, self.data)

# Convert to base64
buf = io.BytesIO()
plt.savefig(buf, format='png')
buf.seek(0)
image_base64 = base64.b64encode(buf.read()).decode('utf-8')
plt.close()

# Return in multimodal format
return ToolOutputImage(image_url=f"data:image/png;base64,{image_base64}")

# Create an agent with the tool
agent = Agent(
name="DataViz",
instructions="You generate charts and visualizations for data analysis.",
tools=[GenerateChartTool]
)
```

<Note>
`function_tool` decorators and `BaseTool` classes both support multimodal outputs in the exact same way.
</Note>

```python
from agency_swarm import ToolOutputImage, function_tool

@function_tool
def fetch_gallery_image() -> ToolOutputImage:
return ToolOutputImage(
image_url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
detail="auto",
)
```

## Tips & Best Practices

- Base64-encoded images can be large. Use file references for large content.
- Compress screenshots and other visuals before returning them to cut token usage without sacrificing clarity.
- Include the image names in your textual response whenever you return more than one image so the agent can reference them unambiguously.

## Real Examples

- [TBD: Include repo from YouTube video]
- [`examples/multimodal_outputs.py`](https://github.com/VRSEN/agency-swarm/blob/main/examples/multimodal_outputs.py)
1 change: 1 addition & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
"pages": [
"core-framework/tools/custom-tools/step-by-step-guide",
"core-framework/tools/custom-tools/pydantic-is-all-you-need",
"core-framework/tools/custom-tools/multimodal-outputs",
"core-framework/tools/custom-tools/best-practices",
"core-framework/tools/custom-tools/configuration"
]
Expand Down
Binary file added examples/data/daily_revenue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/daily_revenue_report.pdf
Binary file not shown.
Loading
Loading