Skip to content

fix: revert #913 to restore #914 + #916 changes lost in bad squash#920

Merged
buremba merged 1 commit into
mainfrom
fix/revert-913-restore-lost-changes
May 19, 2026
Merged

fix: revert #913 to restore #914 + #916 changes lost in bad squash#920
buremba merged 1 commit into
mainfrom
fix/revert-913-restore-lost-changes

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 19, 2026

What

Reverts the squash commit of #913 (`69151a9d`).

Why

When I rebased #913 (promptfoo-multiturn) onto post-#918-merge main, I did a `git reset --soft origin/main` on a branch that had been pre-rebased onto `feat/tool-use-sse` — NOT onto a state that also included #914 (build-hygiene) and #916 (scaffold-dx).

The resulting `git status` diff included MY multiturn additions PLUS the inverse of #914 and #916 (because my branch didn't have them but main did). I committed all of it as "the multiturn delta" and squash-merged. That silently undid #914 and #916 in the same commit.

Damage scope

These changes are silently reverted on main HEAD (`69151a9d`):

Test plan

  • After merge, verify `grep "bun lockfile" AGENTS.md` finds the section
  • Verify `packages/cli/src/templates/.env.tmpl` contains WORKER_PROXY_PORT + LOBU_DATA_DIR
  • Verify `packages/cli/src/commands/login.ts` contains SIGHUP handler
  • Verify `packages/server/src/start-local.ts` contains the baseline-only idempotency allowlist

Follow-up

#913's multiturn work needs to be re-applied as a NEW, additive-only PR off post-revert main. That PR will carry ONLY the multiturn-specific changes (~4 files).

Summary by CodeRabbit

  • New Features

    • CLI init command now supports --list-providers flag and automatically selects free ports to prevent collisions.
    • CLI login command adds -q/--quiet mode for non-interactive environments with proper signal handling.
    • Personal-finance agent includes new evaluation tests for employment income, property lettings, and capital gains tax scenarios.
  • Documentation

    • Enhanced setup guidance for submodule initialization and IDE formatting integration.
    • Clarified evaluation test structure and coverage.
  • Chores

    • Node.js pinned to version 22 in new projects.

Review Change Stack

…e 4 personal-finance evals (#913)"

This reverts commit 69151a9.
@buremba buremba merged commit b648924 into main May 19, 2026
17 of 18 checks passed
@buremba buremba deleted the fix/revert-913-restore-lost-changes branch May 19, 2026 15:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3961ea52-b363-401c-b71f-8d81e48adb53

📥 Commits

Reviewing files that changed from the base of the PR and between 69151a9 and b4ff9b5.

📒 Files selected for processing (16)
  • AGENTS.md
  • examples/personal-finance/agents/personal-finance/evals/README.md
  • examples/personal-finance/agents/personal-finance/evals/gap-surfacing.yaml
  • examples/personal-finance/agents/personal-finance/evals/promptfooconfig.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa102-employment.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa105-property.yaml
  • examples/personal-finance/agents/personal-finance/evals/sa108-cgt.yaml
  • packages/cli/src/commands/init.ts
  • packages/cli/src/commands/login.ts
  • packages/cli/src/index.ts
  • packages/cli/src/templates/.env.tmpl
  • packages/cli/src/templates/.gitignore.tmpl
  • packages/promptfoo-provider/README.md
  • packages/promptfoo-provider/src/__tests__/provider.test.ts
  • packages/promptfoo-provider/src/provider.ts
  • packages/server/src/start-local.ts

📝 Walkthrough

Walkthrough

This PR coordinates a multi-faceted evolution: it removes multi-turn transcript support from the promptfoo provider to simplify single-prompt behavior, migrates evaluation configs and documentation accordingly, introduces four new UK tax-focused evaluation specifications, extends the CLI with provider discovery and free-port selection, and implements non-interactive login mode with signal handling.

Changes

Evaluation Framework Refactoring and New Test Specifications

Layer / File(s) Summary
Provider single-turn implementation and tests
packages/promptfoo-provider/src/provider.ts, packages/promptfoo-provider/src/__tests__/provider.test.ts, packages/promptfoo-provider/README.md
LobuProvider.callApi is simplified to send exactly one prompt via sendAndCollect, removes multi-turn vars.transcript extraction and the extractTranscript helper, and the test suite is refactored to remove mock-based transcript tests in favor of SSE event handling verification.
Eval configuration migration and documentation
examples/personal-finance/agents/personal-finance/evals/promptfooconfig.yaml, examples/personal-finance/agents/personal-finance/evals/README.md
promptfooconfig.yaml is updated with migration notes clarifying partial transition from older YAML runner, multi-turn behavioral tests are removed from this file, and the README documents dormant YAML files, explains single-turn limitations, and lists follow-up migration approaches.
New single-turn evaluation specifications
examples/personal-finance/agents/personal-finance/evals/gap-surfacing.yaml, examples/personal-finance/agents/personal-finance/evals/sa102-employment.yaml, examples/personal-finance/agents/personal-finance/evals/sa105-property.yaml, examples/personal-finance/agents/personal-finance/evals/sa108-cgt.yaml
Four new YAML evaluation specs define scenarios for UK tax compliance: gap-surfacing (P60 disclosure under pressure), sa102-employment (employer/income entity creation), sa105-property (mortgage interest as restricted credit and rental profit calculation), and sa108-cgt (CGT event creation and loss classification).

CLI Commands and Infrastructure

Layer / File(s) Summary
Provider discovery and --list-providers
packages/cli/src/commands/init.ts (provider scaffolding), packages/cli/src/index.ts (CLI wiring)
Add synthetic "Claude (Anthropic)" provider, alias map (anthropicclaude), and helper functions to enumerate and resolve providers; wire --list-providers flag to print available providers and exit.
Port selection and gateway/proxy configuration
packages/cli/src/commands/init.ts (port logic, provider validation), packages/cli/src/templates/.env.tmpl
Implement pickFreePort helper to probe for free TCP ports; auto-select gateway port from 8787 range and WORKER_PROXY_PORT from 8118 range while avoiding collisions; update provider prompt to validate against synthetic/aliased provider IDs; extend .env template variables.
Login quiet mode and signal handling
packages/cli/src/commands/login.ts, packages/cli/src/index.ts (CLI wiring)
Add --quiet flag to suppress spinner and exit immediately on pending response when non-interactive; enforce hard timeout for device-code polling; register signal handlers to abort on SIGHUP, SIGTERM, SIGINT; introduce abortableDelay helper for early wake-up.
Environment and gitignore templates
packages/cli/src/templates/.env.tmpl, packages/cli/src/templates/.gitignore.tmpl, packages/cli/src/commands/init.ts (template writes)
Add LOBU_DATA_DIR for embedded PGlite database isolation, update .env comments for embedded database behavior, pin Node.js to version 22 via .nvmrc and .node-version, and extend .gitignore to exclude .lobu-data/ directory.

Developer Setup Documentation

Layer / File(s) Summary
Bun lockfile and Biome setup guidance
AGENTS.md
Document git submodule update --init + bun install --frozen-lockfile workflow for packages/owletto submodule consistency, and provide Biome/IDE configuration instructions for VS Code, JetBrains, and other editors to align with pre-commit biome check --write behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lobu-ai/lobu#916: Overlaps on lobu init scaffolding enhancements, including --list-providers flag, provider alias/synthetic handling, and free-port logic in packages/cli/src/commands/init.ts and template updates.
  • lobu-ai/lobu#914: Directly overlaps on lobu login --quiet mode with polling abort and hard timeout behavior, and idempotent migration handling in packages/server/src/start-local.ts.
  • lobu-ai/lobu#913: Opposes the multi-turn vars.transcript support by removing transcript extraction/replay logic that was previously added to the promptfoo provider.

Suggested labels

skip-size-check

Poem

🐰 From transcripts of old, now a single prompt flows,
CLI gathers providers, finds ports that glow,
London taxes measured—gap, employment, estates—
Silent logins await while patience awaits,
A framework refined for what agents await! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/revert-913-restore-lost-changes

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 65.30612% with 34 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/cli/src/commands/init.ts 65.30% 34 Missing ⚠️

📢 Thoughts on this report? Let us know!

buremba added a commit that referenced this pull request May 19, 2026
…onal-finance evals (#921)

@lobu/promptfoo-provider gains vars.transcript: string[] support — replays
sequential turns in one Lobu thread, returns the final assistant response
for assertion. Single-turn callers via plain prompt are unchanged.

Migrates the 4 dormant personal-finance behavioural YAMLs (gap-surfacing,
sa102-employment, sa105-property, sa108-cgt) into promptfooconfig.yaml
using vars.transcript. Deletes the original YAML files.

Strictly additive atop current main (which already includes #918's tool_use
SSE events). Re-do of #913 after #920 reverted that PR — the original
landing accidentally undid #914 and #916 because of a bad rebase-and-soft-reset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants