Skip to content

Conversation

tonyxwz
Copy link
Contributor

@tonyxwz tonyxwz commented Jul 25, 2025

I find it a very common pattern in my day job to share large output with another tool (API output, dataset etc). The pattern can be implemented by

  1. passing the reference instead of the actual data to the agent context window
  2. some prompt to let the LLM know about the output reference.

There is no such examples in the documentation to illustrate this usage so I added one. Hope it will be useful to other users.

The format of the variable reference in the example is inspired by IPython: Out[1].

I am more familiar with using Anthropic models on AWS, so I am actually using Claude 3.7 in my local testing. But the example should work with OpenAI models too.

Checks

  • run make
  • preview mkdocs locally

Running the Example

examples/pydantic_ai_examples/data_analyst.py:50 load_dataset
    path: 'cornell-movie-review-data/rotten_tomatoes' (str) len=41
    split: 'train' (str) len=5
examples/pydantic_ai_examples/data_analyst.py:64 load_dataset
    ref: 'Out[1]' (str) len=6
examples/pydantic_ai_examples/data_analyst.py:97 display
    name: 'Out[1]' (str) len=6
    dataset.head(): <DataFrame({
        'text': <Series({
            0: (
                'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even'
                ' greater than arnold schwarzenegger , jean-claud van damme or steven segal .'
            ),
            1: (
                'the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column o'
                "f words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . "
                "tolkien's middle-earth ."
            ),
            2: 'effective but too-tepid biopic',
            3: 'if you sometimes like to go to the movies to have fun , wasabi is a good place to start .',
            4: (
                "emerges as something rare , an issue movie that's so honest and keenly observed that it doesn't feel "
                'like one .'
            ),
        })>,
        'label': <Series({
            0: 1,
            1: 1,
            2: 1,
            3: 1,
            4: 1,
        })>,
    })> (DataFrame) len=5
examples/pydantic_ai_examples/data_analyst.py:85 run_duckdb
    sql: 'SELECT label, COUNT(*) as count FROM dataset GROUP BY label ORDER BY label' (str) len=74
    dataset: 'Out[1]' (str) len=6
examples/pydantic_ai_examples/data_analyst.py:97 display
    name: 'Out[2]' (str) len=6
    dataset.head(): <DataFrame({
        'label': <Series({
            0: 0,
            1: 1,
        })>,
        'count': <Series({
            0: 4265,
            1: 4265,
        })>,
    })> (DataFrame) len=2
examples/pydantic_ai_examples/data_analyst.py:107 <module>
    result.output: (
        'Based on the results, there are **4,265 negative comments** in the Cornell Movie Review Dataset (Rotten Tomat'
        'oes). The dataset appears to be balanced, with an equal number of negative and positive reviews in the traini'
        'ng split.'
    ) (str) len=227

Copy link
Contributor

hyperlint-ai bot commented Jul 25, 2025

PR Change Summary

Added an example demonstrating how to use RunContext for passing data among tools, particularly in data analytics workflows.

  • Introduced a new example in the documentation for using RunContext to reference tool outputs.
  • Updated the tools documentation for clarity on tool registration.
  • Provided a detailed example of analyzing a dataset using RunContext.

Modified Files

  • docs/tools.md

Added Files

  • docs/examples/data-analyst.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

@tonyxwz tonyxwz force-pushed the examples-data-analyst branch 4 times, most recently from 2533e75 to 004db97 Compare July 25, 2025 21:18
@DouweM DouweM self-assigned this Jul 29, 2025
Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyxwz Thanks Tony! Just a few suggestions

@tonyxwz tonyxwz marked this pull request as draft July 29, 2025 17:52
@tonyxwz tonyxwz force-pushed the examples-data-analyst branch from f1ddfda to c65c67a Compare July 29, 2025 18:26
@tonyxwz tonyxwz marked this pull request as ready for review July 29, 2025 18:36
Copy link
Contributor Author

@tonyxwz tonyxwz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new revision.

@tonyxwz tonyxwz requested a review from DouweM July 29, 2025 18:44

Demonstrates:

- [agent dependencies](../dependencies.md)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pydantic/pydantic-ai/actions/runs/16605150201/job/46974951928#step:9:46

Run uv run --no-project --with httpx .github/set_docs_pr_preview_url.py
Installed 8 packages in 5ms
Docs preview URL: https://91de4635-pydantic-ai-previews.pydantic.workers.dev/
Pull request number not set
Error: Process completed with exit code 1.

https://91de4635-pydantic-ai-previews.pydantic.workers.dev/

dataset: reference string to the DataFrame
sql: the query to be executed using DuckDB
"""
data = ctx.deps.output[dataset]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use get here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed to using get.



if __name__ == '__main__':
deps = AnalystAgentDeps(output={})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this a bit cleaner by giving the output field a default value of {}, and then passing deps=AnalystAgentDeps() below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added field and default_factory in the class


@analyst_agent.tool
def display(ctx: RunContext[AnalystAgentDeps], name: str) -> str:
"""Display at most 5 rows of the dataframe ."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Display at most 5 rows of the dataframe ."""
"""Display at most 5 rows of the dataframe."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyxwz Thanks Tony, this'll be a nice example -- just a few final suggestions and I'll merge

user_prompt='Count how many negative comments are there in the dataset `cornell-movie-review-data/rotten_tomatoes`',
deps=deps,
)
debug(result.output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use print here. Don't forget to update the output in the doc as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tonyxwz tonyxwz force-pushed the examples-data-analyst branch from 19486f9 to 5ee12e0 Compare July 31, 2025 08:31
@tonyxwz tonyxwz force-pushed the examples-data-analyst branch from 9a83451 to ae9943e Compare July 31, 2025 08:48
@DouweM DouweM merged commit 69e90bc into pydantic:main Aug 1, 2025
28 checks passed
@DouweM
Copy link
Collaborator

DouweM commented Aug 1, 2025

@tonyxwz Thanks Tony, nice addition!

@tonyxwz tonyxwz deleted the examples-data-analyst branch August 4, 2025 08:38
ethanabrooks added a commit to reflectionai/pydantic-ai that referenced this pull request Aug 20, 2025
* Add `priority` `service_tier` to `OpenAIModelSettings` and respect it in `OpenAIResponsesModel` (pydantic#2368)

* Add an example of using RunContext to pass data among tools (pydantic#2316)

Co-authored-by: Douwe Maan <[email protected]>

* Rename gemini-2.5-flash-lite-preview-06-17 to gemini-2.5-flash-lite as it's out of preview (pydantic#2387)

* Fix toggleable toolset example so toolset state is not shared across agent runs (pydantic#2396)

* Support custom thinking tags specified on the model profile (pydantic#2364)

Co-authored-by: jescudero <[email protected]>
Co-authored-by: Douwe Maan <[email protected]>

* Add convenience functions to handle AG-UI requests with request-specific deps (pydantic#2397)

* docs: add missing optional packages in `install.md` (pydantic#2412)

* Include default values in tool arguments JSON schema (pydantic#2418)

* Fix "test_download_item_no_content_type test fails on macOS" (pydantic#2404)

* Allow string format, pattern and others in OpenAI strict JSON mode (pydantic#2420)

* Let more `BaseModel`s use OpenAI strict JSON mode by defaulting to `additionalProperties=False` (pydantic#2419)

* BREAKING CHANGE: Change type of 'source' field on EvaluationResult (pydantic#2388)

Co-authored-by: Douwe Maan <[email protected]>

* Fix ImageUrl, VideoUrl, AudioUrl and DocumentUrl not being serializable (pydantic#2422)

* BREAKING CHANGE: Support printing reasons in the console output for pydantic-evals (pydantic#2163)

* Document performance implications of async vs sync tools (pydantic#2298)

Co-authored-by: Douwe Maan <[email protected]>

* Mention that tools become toolset internally (pydantic#2395)

Co-authored-by: Douwe Maan <[email protected]>

* Fix tests for Logfire>=3.22.0 (pydantic#2346)

* tests: speed up the test suite (pydantic#2414)

* google: add more information about schema on union (pydantic#2426)

* typo in output docs (pydantic#2427)

* Deprecate `GeminiModel` in favor of `GoogleModel` (pydantic#2416)

* Use `httpx` on `GoogleProvider` (pydantic#2438)

* Remove older deprecated models and add new model of Anthropic (pydantic#2435)

* Remove `next()` method from `Graph` (pydantic#2440)

* BREAKING CHANGE: Remove `data` from `FinalResult` (pydantic#2443)

* BREAKING CHANGE: Remove `get_data` and `validate_structured_result` from `StreamedRunResult` (pydantic#2445)

* docs: add `griffe_warnings_deprecated` (pydantic#2444)

* BREAKING CHANGE: Remove `format_as_xml` module (pydantic#2446)

* BREAKING CHANGE: Remove `result_type` parameter and similar from `Agent` (pydantic#2441)

* Deprecate `GoogleGLAProvider` and `GoogleVertexProvider` (pydantic#2450)

* BREAKING CHANGE: drop 4 months old deprecation warnings (pydantic#2451)

* Automatically use OpenAI strict mode for strict-compatible native output types (pydantic#2447)

* Make `InlineDefsJsonSchemaTransformer` public (pydantic#2455)

* Send `ThinkingPart`s back to Anthropic used through Bedrock (pydantic#2454)

* Bump boto3 to support `AWS_BEARER_TOKEN_BEDROCK` API key env var (pydantic#2456)

* Add new Heroku models (pydantic#2459)

* Add `builtin_tools` to `Agent` (pydantic#2102)

Co-authored-by: Marcelo Trylesinski <[email protected]>
Co-authored-by: Douwe Maan <[email protected]>

* Bump mcp-run-python (pydantic#2470)

* Remove fail_under from top-level coverage config so <100% html-coverage step doesn't end CI run (pydantic#2475)

* Add AbstractAgent, WrapperAgent, Agent.event_stream_handler, Toolset.id, Agent.override(tools=...) in preparation for Temporal (pydantic#2458)

* Let toolsets be built dynamically based on run context (pydantic#2366)

Co-authored-by: Douwe Maan <[email protected]>

* Add ToolsetFunc to API docs (fix CI) (pydantic#2486)

* tests: change time of evals example (pydantic#2501)

* ci: remove html and xml reports (pydantic#2491)

* fix: Add gpt-5 models to reasoning model detection for temperature parameter handling (pydantic#2483)

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Marcelo Trylesinski <[email protected]>

* History processor replaces message history (pydantic#2324)

Co-authored-by: Marcelo Trylesinski <[email protected]>

* ci: split test suite (pydantic#2436)

Co-authored-by: Douwe Maan <[email protected]>

* ci: use the right install command (pydantic#2506)

* Update config.yaml (pydantic#2514)

* Skip testing flaky evals example (pydantic#2518)

* Fix error when parsing usage details for video without audio track in Google models (pydantic#2507)

* Make OpenAIResponsesModelSettings.openai_builtin_tools work again (pydantic#2520)

* Let Agent be run in a Temporal workflow by moving model requests, tool calls, and MCP to Temporal activities (pydantic#2225)

* Install only dev in CI (pydantic#2523)

* Improve CLAUDE.md (pydantic#2524)

* Add best practices regarding to coverage to CLAUDE.md (pydantic#2527)

* Add support for `"openai-responses"` model inference string (pydantic#2528)

Co-authored-by: Claude <[email protected]>

* docs: Confident AI (pydantic#2529)

* chore: mention what to do with the documentation when deprecating a class (pydantic#2530)

* chore: drop hyperlint (pydantic#2531)

* ci: improve matrix readability (pydantic#2532)

* Add pip to dev deps for PyCharm (pydantic#2533)

Co-authored-by: Marcelo Trylesinski <[email protected]>

* Add genai-prices to dev deps and a basic test (pydantic#2537)

* Add `--durations=100` to all pytest calls in CI (pydantic#2534)

* Cleanup snapshot in test_evaluate_async_logfire (pydantic#2538)

* Make some minor tweaks to the temporal docs (pydantic#2522)

Co-authored-by: Douwe Maan <[email protected]>

* Add new OpenAI GPT-5 models (pydantic#2503)

* Fix `FallbackModel` to respect each model's model settings (pydantic#2540)

* Add support for OpenAI verbosity parameter in Responses API (pydantic#2493)

Co-authored-by: Claude <[email protected]>
Co-authored-by: Douwe Maan <[email protected]>

* Add `UsageLimits.count_tokens_before_request` using Gemini `count_tokens` API (pydantic#2137)

Co-authored-by: Douwe Maan <[email protected]>

* chore: Fix uv.lock (pydantic#2546)

* Stop calling MCP server `get_tools` ahead of `agent run` span (pydantic#2545)

* Disable instrumentation by default in tests (pydantic#2535)

Co-authored-by: Marcelo Trylesinski <[email protected]>

* Only wrap necessary parts of type aliases in forward annotations (pydantic#2548)

* Remove anthropic-beta default header set in `AnthropicModel` (pydantic#2544)

Co-authored-by: Marcelo Trylesinski <[email protected]>

* docs: Clarify why AG-UI example links are on localhost (pydantic#2549)

* chore: Fix path to agent class in CLAUDE.md (pydantic#2550)

* Ignore leading whitespace when streaming from Qwen or DeepSeek (pydantic#2554)

* Ask model to try again if it produced a response without text or tool calls, only thinking (pydantic#2556)

Co-authored-by: Douwe Maan <[email protected]>

* chore: Improve Temporal test to check trace as tree instead of list (pydantic#2559)

* Fix: Forward max_uses parameter to Anthropic WebSearchTool (pydantic#2561)

* Let message history end on ModelResponse and execute pending tool calls (pydantic#2562)

* Fix type issues

* skip tests requiring API keys

* add `google-genai` dependency

* add other provider deps

* add pragma: no cover for untested logic

---------

Co-authored-by: akenar <[email protected]>
Co-authored-by: Tony Woland <[email protected]>
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Yi-Chen Lin <[email protected]>
Co-authored-by: José I. Escudero <[email protected]>
Co-authored-by: jescudero <[email protected]>
Co-authored-by: Marcelo Trylesinski <[email protected]>
Co-authored-by: William Easton <[email protected]>
Co-authored-by: David Montague <[email protected]>
Co-authored-by: Guillermo <[email protected]>
Co-authored-by: Hamza Farhan <[email protected]>
Co-authored-by: Mohamed Amine Zghal <[email protected]>
Co-authored-by: Yinon Ehrlich <[email protected]>
Co-authored-by: Matthew Brandman <[email protected]>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Alex Enrique <[email protected]>
Co-authored-by: Jerry Yan <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: Mayank <[email protected]>
Co-authored-by: Alex Hall <[email protected]>
Co-authored-by: Jerry Lin <[email protected]>
Co-authored-by: Raymond Xu <[email protected]>
Co-authored-by: kauabh <[email protected]>
Co-authored-by: Victorien <[email protected]>
Co-authored-by: Ethan Brooks <[email protected]>
Co-authored-by: eballesteros <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants