feat(promptfoo-provider): vars.transcript multi-turn + migrate 4 personal-finance evals (redo)#921
Conversation
…onal-finance evals @lobu/promptfoo-provider gains vars.transcript: string[] support — replays sequential turns in one Lobu thread, returns the final assistant response for assertion. Single-turn callers via plain prompt are unchanged. Migrates the 4 dormant personal-finance behavioural YAMLs (gap-surfacing, sa102-employment, sa105-property, sa108-cgt) into promptfooconfig.yaml using vars.transcript. Deletes the original YAML files. Strictly additive atop current main (which already includes #918's tool_use SSE events). Re-do of #913 after #920 reverted that PR — the original landing accidentally undid #914 and #916 because of a bad rebase-and-soft-reset.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR extends LobuProvider to support multi-turn transcript replay within a single Lobu session, consolidates four personal-finance evaluation YAML specs into a single promptfooconfig.yaml, and updates documentation for both the provider feature and the new eval structure. ChangesMulti-turn Promptfoo Provider and Eval Consolidation
Sequence DiagramsequenceDiagram
participant Test as Promptfoo Test Runner
participant Provider as LobuProvider.callApi
participant Extract as extractTranscript
participant Lobu as Lobu Session
Test->>Provider: callApi(prompt, context with vars.transcript)
Provider->>Extract: extractTranscript(context)
Extract-->>Provider: string[] or undefined
alt Multi-turn (valid transcript)
loop for each turn in transcript
Provider->>Lobu: POST /messages (turn content)
Lobu-->>Provider: SSE stream with turn-<n> output
end
Provider-->>Test: final turn output + usage + metadata
else Single-turn (no valid transcript)
Provider->>Lobu: POST /messages (original prompt)
Lobu-->>Provider: SSE stream with turn-<n> output
Provider-->>Test: output + usage + metadata
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
Re-do of #913 after #920 reverted it. Strictly additive atop current main (which already includes #918's tool_use SSE events).
@lobu/promptfoo-provider gains vars.transcript: string[] support — replays sequential turns in one Lobu thread, returns the final assistant response for assertion. Single-turn callers via plain prompt are unchanged.
Migrates the 4 dormant personal-finance behavioural YAMLs (gap-surfacing, sa102-employment, sa105-property, sa108-cgt) into promptfooconfig.yaml using vars.transcript. Deletes the original YAMLs.
Why this is a re-do
The original #913 silently undid #914 + #916 because of a bad rebase-and-soft-reset. The squash commit's diff included the inverse of #914 + #916 changes (which existed on main but not in my branch). This redo branches from current post-revert main and applies ONLY the multiturn-specific files (verified: 0 lines changed in any #914/#916 file before commit).
Test plan
Summary by CodeRabbit
New Features
Documentation
Tests