Skip to content

feat(promptfoo-provider): vars.transcript multi-turn + migrate 4 personal-finance evals (redo)#921

Merged
buremba merged 1 commit into
mainfrom
feat/promptfoo-multiturn-redo
May 19, 2026
Merged

feat(promptfoo-provider): vars.transcript multi-turn + migrate 4 personal-finance evals (redo)#921
buremba merged 1 commit into
mainfrom
feat/promptfoo-multiturn-redo

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 19, 2026

Summary

Re-do of #913 after #920 reverted it. Strictly additive atop current main (which already includes #918's tool_use SSE events).

@lobu/promptfoo-provider gains vars.transcript: string[] support — replays sequential turns in one Lobu thread, returns the final assistant response for assertion. Single-turn callers via plain prompt are unchanged.

Migrates the 4 dormant personal-finance behavioural YAMLs (gap-surfacing, sa102-employment, sa105-property, sa108-cgt) into promptfooconfig.yaml using vars.transcript. Deletes the original YAMLs.

Why this is a re-do

The original #913 silently undid #914 + #916 because of a bad rebase-and-soft-reset. The squash commit's diff included the inverse of #914 + #916 changes (which existed on main but not in my branch). This redo branches from current post-revert main and applies ONLY the multiturn-specific files (verified: 0 lines changed in any #914/#916 file before commit).

Test plan

  • make typecheck strict — clean
  • bun test packages/promptfoo-provider/src/tests/provider.test.ts — 8/8 pass
  • git diff origin/main on AGENTS.md / login.ts / init.ts / start-local.ts / .env.tmpl — 0 lines (none touched)

Summary by CodeRabbit

  • New Features

    • Introduced multi-turn evaluation support for testing sequential conversations within a single session.
  • Documentation

    • Updated evaluation framework documentation with multi-turn usage guidance and clarified test execution patterns.
  • Tests

    • Expanded test coverage for multi-turn conversation handling and evaluation scenarios.

Review Change Stack

…onal-finance evals

@lobu/promptfoo-provider gains vars.transcript: string[] support — replays
sequential turns in one Lobu thread, returns the final assistant response
for assertion. Single-turn callers via plain prompt are unchanged.

Migrates the 4 dormant personal-finance behavioural YAMLs (gap-surfacing,
sa102-employment, sa105-property, sa108-cgt) into promptfooconfig.yaml
using vars.transcript. Deletes the original YAML files.

Strictly additive atop current main (which already includes #918's tool_use
SSE events). Re-do of #913 after #920 reverted that PR — the original
landing accidentally undid #914 and #916 because of a bad rebase-and-soft-reset.
@buremba buremba merged commit 9453f37 into main May 19, 2026
4 of 5 checks passed
@buremba buremba deleted the feat/promptfoo-multiturn-redo branch May 19, 2026 15:03
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d5983a3d-6b35-49b6-80b9-3c9cf530c0f6

📥 Commits

Reviewing files that changed from the base of the PR and between b648924 and 046440c.

📒 Files selected for processing (9)
  • examples/personal-finance/agents/personal-finance/evals/README.md
  • examples/personal-finance/agents/personal-finance/evals/gap-surfacing.yaml
  • examples/personal-finance/agents/personal-finance/evals/promptfooconfig.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa102-employment.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa105-property.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa108-cgt.yaml
  • packages/promptfoo-provider/README.md
  • packages/promptfoo-provider/src/__tests__/provider.test.ts
  • packages/promptfoo-provider/src/provider.ts

📝 Walkthrough

Walkthrough

This PR extends LobuProvider to support multi-turn transcript replay within a single Lobu session, consolidates four personal-finance evaluation YAML specs into a single promptfooconfig.yaml, and updates documentation for both the provider feature and the new eval structure.

Changes

Multi-turn Promptfoo Provider and Eval Consolidation

Layer / File(s) Summary
Core multi-turn provider logic
packages/promptfoo-provider/src/provider.ts
LobuProvider.callApi accepts context.vars.transcript (string array) and iterates through transcript turns in a single Lobu session, returning only the final turn's output/usage/metadata. New extractTranscript helper validates and filters transcript entries (removing empty/whitespace-only strings) and returns undefined when no valid turns exist, triggering fallback to single-turn mode.
Multi-turn provider tests
packages/promptfoo-provider/src/__tests__/provider.test.ts
Adds installGatewayMock() helper that mocks globalThis.fetch to simulate four Lobu endpoints (session creation, message sending, SSE streaming with turn-indexed output, session cleanup) and records requests. New LobuProvider.callApi test suite validates single-turn response mapping, multi-turn transcript replay with session reuse, empty-entry filtering, and fallback to single-turn when transcript is non-array or empty.
Multi-turn provider documentation
packages/promptfoo-provider/README.md
Documents how to run multi-turn tests by setting vars.transcript to a string array, explaining provider replay behavior within a single Lobu thread, assertion targeting of the final response, and fallback behavior. Includes YAML example and clarifies empty-string filtering.
Consolidated promptfooconfig with multi-turn tests
examples/personal-finance/agents/personal-finance/evals/promptfooconfig.yaml
Adds provider documentation explaining vars.query (single-turn) vs vars.transcript (multi-turn) mapping and clarifies query template usage. Consolidates four multi-turn test cases (gap-surfacing refusal, sa102 missing-info handling, sa105 rental-profit calculation, sa108 loss-treatment rules) using vars.transcript for sequential user turns with rubric/regex assertions.
Eval structure and consolidation documentation
examples/personal-finance/agents/personal-finance/evals/README.md
Clarifies all evals are defined in promptfooconfig.yaml and executed via promptfoo with the provider. Adds Coverage section listing six checks split by input type. Updates Dormant YAML files section to note ping.yaml and tax-year-anchoring.yaml are reference-only since promptfoo reads only the single config file.

Sequence Diagram

sequenceDiagram
  participant Test as Promptfoo Test Runner
  participant Provider as LobuProvider.callApi
  participant Extract as extractTranscript
  participant Lobu as Lobu Session
  Test->>Provider: callApi(prompt, context with vars.transcript)
  Provider->>Extract: extractTranscript(context)
  Extract-->>Provider: string[] or undefined
  alt Multi-turn (valid transcript)
    loop for each turn in transcript
      Provider->>Lobu: POST /messages (turn content)
      Lobu-->>Provider: SSE stream with turn-<n> output
    end
    Provider-->>Test: final turn output + usage + metadata
  else Single-turn (no valid transcript)
    Provider->>Lobu: POST /messages (original prompt)
    Lobu-->>Provider: SSE stream with turn-<n> output
    Provider-->>Test: output + usage + metadata
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lobu-ai/lobu#911: Initial drop-in of @lobu/promptfoo-provider replacing the in-house YAML runner; both PRs extend the same provider implementation.
  • lobu-ai/lobu#913: Also implements multi-turn transcript replay via context.vars.transcript and aligns personal-finance eval setup accordingly.

Suggested labels

skip-size-check

Poem

🐰 Through transcripts long, the turns now flow,
One thread, one session, watch it grow.
From scattered YAMLs, consolidated bright,
The promptfoo provider shines with multi-turn light!
Assertions final, each sequence traced—
A conversation perfectly placed. 🎯

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/promptfoo-multiturn-redo

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants