Skip to content

fix: CI test failures in message-list and evals/run#13719

Merged
abhiaiyer91 merged 6 commits into
mainfrom
fix/ci-test-failures
Mar 3, 2026
Merged

fix: CI test failures in message-list and evals/run#13719
abhiaiyer91 merged 6 commits into
mainfrom
fix/ci-test-failures

Conversation

@abhiaiyer91
Copy link
Copy Markdown
Member

@abhiaiyer91 abhiaiyer91 commented Mar 3, 2026

Summary

Fixes two test failures surfaced in the CI pipeline for the changeset-release PR (#13523).

Changes

1. message-list tests — filename preservation

2. evals/run — step scoring guard

  • The guard condition stepResult.payload && stepResult.output prevented scorers from running on step results
  • Root cause: the workflow engine's fmtReturnValue strips payload when it matches the previous step's output (optimization to avoid redundant data)
  • Fix: removed payload from the guard and use scoringData.input as fallback input for the scorer

Test plan

pnpm test packages/core/src/agent/message-list/tests/message-list.test.ts  # 89 pass
pnpm test packages/core/src/evals/run/index.test.ts                        # 26 pass

Not addressed (environment/infra issues)

  • Memory tests: corrupted fastembed model cache in CI (ZlibError)
  • Core LLM tests: flaky external API calls (Gemini timeouts/schema errors)
  • E2E deployer tests: wrangler config mismatch (projectName field)
  • Observational-memory tests: broken from introduction (XML formatting, missing setup)
  • Agent Builder tests: createTool export and @mastra/mcp module resolution

Summary by CodeRabbit

  • Bug Fixes

    • Relaxed per-step validation in evaluation flows and added input fallback so scorers receive available input when step payload is missing.
  • Tests

    • Made tests tolerant of an optional filename on file parts to avoid brittle equality checks.
    • Removed a noisy debug log from a test to reduce output.
    • Broadened template-merge validation to accept multiple generated file naming conventions and flexible commit message matching.
  • Chores

    • Updated an example project's tool import path and simplified a contributor workflow matrix entry.

…s step scoring guard

- message-list tests: add expected `filename` property after #13574
  introduced filename preservation in AIV4Adapter.fromCoreMessage()
- evals/run: fix guard that required `stepResult.payload`, which the
  workflow engine strips when it matches previous output. Use
  `scoringData.input` as fallback for scorer input.
- Remove stale debug console.log from evals test

Co-Authored-By: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 3, 2026

⚠️ No Changeset found

Latest commit: 051f307

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
mastra-docs Ready Ready Preview, Comment Mar 3, 2026 4:59pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
mastra-docs-1.x Skipped Skipped Mar 3, 2026 4:59pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4099dea and 051f307.

📒 Files selected for processing (1)
  • .github/workflows/contributor_actions.yml

Walkthrough

Loosened test and scorer validations: tests now allow optional filename on file parts; removed a debug log; scorer logic accepts status: 'success' with output and prefers stepResult.payload with fallback to targetResult.scoringData.input; updated an import path and broadened integration test filename patterns.

Changes

Cohort / File(s) Summary
Message-list tests
packages/core/src/agent/message-list/tests/message-list.test.ts
Changed exact equality to partial matching with expect.objectContaining(...) and placed matcher inside parts so file parts may include an optional filename.
Eval runner tests
packages/core/src/evals/run/index.test.ts
Removed a console.log that printed full test results; assertions and test behavior unchanged.
Eval scoring logic
packages/core/src/evals/run/index.ts
Relaxed per-step validation to accept results with status: 'success' and output; when building scorer input prefer stepResult.payload with fallback to targetResult.scoringData.input.
Agent builder fixture import
packages/agent-builder/integration-tests/src/fixtures/.../mastra/agents/weather.ts
Updated createTool import path from @mastra/core to @mastra/core/tools.
Template integration test
packages/agent-builder/integration-tests/src/template-integration.test.ts
Expanded accepted generated filename patterns (camelCase and kebab-case variants) for agents and tools; relaxed git-history commit-message assertion to a numeric-count regex.
CI workflow
.github/workflows/contributor_actions.yml
Removed the "Tool Builder Tests" matrix entry from the contributor-actions workflow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: fixing CI test failures in two specific areas (message-list and evals/run), uses imperative mood, proper capitalization, and is concise at 51 characters.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/ci-test-failures

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/evals/run/index.ts`:
- Around line 353-359: The current check skips valid falsy outputs and
unconditionally replaces intentionally-null payloads; update the condition in
the block that iterates stepScorers (the code referencing stepResult,
stepScorers and scorer.run) to require status === 'success' and an explicit
presence check for output (e.g., stepResult.output !== undefined) so 0/false/''
are accepted, and change the input selection to use the payload only when it is
defined (e.g., stepResult.payload !== undefined ? stepResult.payload :
targetResult.scoringData.input) instead of using the nullish coalescing
operator.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8118e3 and 806b07c.

📒 Files selected for processing (3)
  • packages/core/src/agent/message-list/tests/message-list.test.ts
  • packages/core/src/evals/run/index.test.ts
  • packages/core/src/evals/run/index.ts
💤 Files with no reviewable changes (1)
  • packages/core/src/evals/run/index.test.ts

Comment thread packages/core/src/evals/run/index.ts Outdated
…ring

Address CodeRabbit review: use `!== undefined` instead of truthy checks
so step outputs like 0, false, or '' are still scored. Use explicit
ternary for payload to distinguish missing from intentionally null.

Co-Authored-By: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
@vercel vercel Bot temporarily deployed to Preview – mastra-docs-1.x March 3, 2026 15:23 Inactive
@vercel vercel Bot temporarily deployed to Preview – mastra-docs March 3, 2026 15:23 Inactive
…names and fix createTool import path

- Add actual template agent filenames (csv-summarization-agent, text-question-agent)
  and their camelCase variants to expected patterns in template-integration test
- Add second template tool filename (generate-questions-from-text-tool) to patterns
- Fix fixture weather.ts to import createTool from '@mastra/core/tools' instead of
  '@mastra/core' (which doesn't export it)

Co-Authored-By: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
@vercel vercel Bot temporarily deployed to Preview – mastra-docs March 3, 2026 15:52 Inactive
@vercel vercel Bot temporarily deployed to Preview – mastra-docs-1.x March 3, 2026 15:52 Inactive
Comment on lines 357 to +358
const score = await scorer.run({
input: stepResult.payload,
input: stepResult.payload !== undefined ? stepResult.payload : targetResult.scoringData.input,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure this is correct?

The copy step's file count varies depending on conflict detection (e.g.,
index.ts already exists in the fixture). Use a regex pattern instead of
hardcoding 'copy 7 files' to handle this variability.

Co-Authored-By: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
@vercel vercel Bot temporarily deployed to Preview – mastra-docs March 3, 2026 16:19 Inactive
@vercel vercel Bot temporarily deployed to Preview – mastra-docs-1.x March 3, 2026 16:19 Inactive
Tool Builder tests are already covered by the Full Test Suite. The
pending status check had no corresponding workflow to resolve it,
causing it to be perpetually stuck at 'Waiting for status to be reported.'

Co-Authored-By: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
@abhiaiyer91 abhiaiyer91 merged commit f7e57f3 into main Mar 3, 2026
34 of 43 checks passed
@abhiaiyer91 abhiaiyer91 deleted the fix/ci-test-failures branch March 3, 2026 17:01
wardpeet pushed a commit that referenced this pull request Mar 9, 2026
Co-authored-by: Mastra Code (anthropic/claude-opus-4-6) <noreply@mastra.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants