Skip to content

Conversation

@nirinchev
Copy link
Collaborator

Proposed changes

This tweaks a few more accuracy tests so that the expectations are more loosely defined and don't flag valid tool calls as inaccurate.

@nirinchev nirinchev requested a review from a team as a code owner October 22, 2025 12:31
@Copilot Copilot AI review requested due to automatic review settings October 22, 2025 12:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines accuracy test expectations to reduce false negatives by making assertions more flexible. The changes ensure that valid tool calls with optional or varying parameters are not incorrectly flagged as inaccurate.

Key changes:

  • Updated test expectations to accept optional parameters like limit, responseBytesLimit, sampleSize, and operations
  • Added optional list-databases and list-collections calls that models may make before main operations
  • Refined prompt wording to better elicit expected behavior (e.g., "exported COMPLETE list")

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/accuracy/find.test.ts Added optional list calls helper and applied to all test cases; refined prompt wording
tests/accuracy/collectionSchema.test.ts Added optional list-collections calls and flexible parameter matchers
tests/accuracy/export.test.ts Made filter argument accept empty object or undefined
tests/accuracy/explain.test.ts Added flexible matcher for responseBytesLimit parameter
tests/accuracy/logs.test.ts Added flexible matcher for limit parameter
tests/accuracy/getPerformanceAdvisor.test.ts Added flexible matcher for operations parameter array
tests/integration/tools/mongodb/read/find.test.ts Changed string concatenation to template literal for consistency
tests/integration/tools/mongodb/read/aggregate.test.ts Changed string concatenation to template literal for consistency
tests/integration/tools/mongodb/metadata/collectionSchema.test.ts Whitespace-only change (no functional impact)
src/tools/mongodb/read/find.ts Changed multi-line string to template literal for consistency
src/tools/mongodb/read/aggregate.ts Changed multi-line string to template literal for consistency
src/tools/mongodb/metadata/collectionSchema.ts Whitespace-only change (no functional impact)

Copy link
Collaborator

@blva blva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but would be good to trigger the accuracy test in the branch before merging!

@nirinchev
Copy link
Collaborator Author

Those changes don't fix all the false negatives, but I did run them locally and improved the accuracy of some tests. The annoying part is that those are quite unpredictable so we need multiple runs to identify the flakes.

@github-actions
Copy link
Contributor

📊 Accuracy Test Results

📈 Summary

Metric Value
Commit SHA 52755ef69a3ef53818626c2928120f1ec10c5e56
Run ID f5e86f02-f408-4939-9c93-18f6617ae7c5
Status done
Total Prompts Evaluated 97
Models Tested 1
Average Accuracy 93.4%
Responses with 0% Accuracy 6
Responses with 75% Accuracy 2
Responses with 100% Accuracy 91

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 10/23/2025, 12:44:24 PM

@nirinchev nirinchev merged commit 80cb5be into main Oct 24, 2025
19 checks passed
@nirinchev nirinchev deleted the ni/accuracy-tests branch October 24, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants