Skip to content

test[notask]: consolidate tools_compact integration test coverage#1406

Closed
tobi-legan wants to merge 6 commits into
mainfrom
test/dynamic-tools-coverage
Closed

test[notask]: consolidate tools_compact integration test coverage#1406
tobi-legan wants to merge 6 commits into
mainfrom
test/dynamic-tools-coverage

Conversation

@tobi-legan

@tobi-legan tobi-legan commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

  • The tools_compact feature (formerly tools_at_end) had integration tests split across two files with different API patterns
  • dynamic-tools.test.js used the deprecated tools_at_end config key and removed role:'session' API — tests were not actually exercising the feature
  • No tests covered tool_call output verification, concurrent run resilience, session lifecycle, evolved schemas, or enum-typed parameters

How does it solve it?

Consolidates all tools_compact integration test coverage into a single canonical file (tools-compact.test.js), using the correct API (tools_compact config key, cacheKey run option).

Deleted:

  • dynamic-tools.test.js (broken: used legacy config key + dead session API)

Added to tools-compact.test.js (12 new tests, 7 existing = 19 total):

  • Output contains tool_call block when tools are provided
  • Tool_call references correct tool after swap (no stale tool in KV cache)
  • Conversation history preserved after tool swap
  • Extended 5-turn session with mixed tool changes
  • Many tools with complex schemas (5 tools, real agent workload)
  • Session save → destroy → reload → continue with different tools
  • Cancel mid-generation then reuse with tools
  • Large tool payload near context limit (ctx_size=512)
  • Same tool name with evolved schema between turns
  • Concurrent model.run() rejects cleanly and model survives
  • Corrupted session file does not crash model
  • Tool with enum-typed parameters (mirrors SDK dynamic-tools test gap)

How was it tested?

All 19 tests pass locally with fresh 0.21.0 native build:

# tests = 19/19 pass
# asserts = 89/89 pass
# time = 33611ms
# ok

Tested on Apple M4 Pro with Qwen3-0.6B-Q8_0.gguf model, tools_compact: 'true'.

…analysis

JS integration (dynamic-tools.test.js): 7 new tests covering pitch DoD
gaps — tool_call output verification, conversation history across tool
swap, A→B→A round-trip, 5-turn extended session, many-tools payload,
session save/reload lifecycle, and cancel-mid-generation reuse.

C++ unit (test_cache_management_qwen3.cpp): 1 regression test for the
reviewer-flagged firstMsgTokens_ inflation bug — uses small context to
force sliding-window discard after tools_at_end trim.

Made-with: Cursor
Add 6 break-it tests derived purely from docs, pitch, and README —
no implementation code was consulted. Each test represents a real
integrator mistake or documentation gap:

- conflicting config (tools_at_end=true + tools=false)
- session disappears between turns
- system message changes mid-conversation
- stale tool_call blocks not stripped from prior response
- tool with empty name
- duplicate tool names in same prompt

Also fix corrupted session file test error handling to prevent
uncaught error from crashing the test runner.

All 26 tests pass (109/109 assertions).

Made-with: Cursor
- Rename [adversarial] tag to [edge-case] for clarity
- Remove redundant "alternating tools/no-tools across 5 turns" test
  (covered by existing interleaving + 5-turn session tests)

25 tests remain, 0 redundancy.

Made-with: Cursor
Use const for variables that are never reassigned.

Made-with: Cursor
@kinsta

kinsta Bot commented Apr 8, 2026

Copy link
Copy Markdown

Preview deployments for qvac-docs-staging ⚡️

Status Branch preview Commit preview
✅ Ready Visit preview Visit preview

Commit: 650c7b8f0aab527f202d7837d930ff5e8d5b261a

Deployment ID: 7cb7b4fb-15e8-4265-a015-be11fcf54a43

Static site name: qvac-docs-staging-fazwv

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

@github-actions

Copy link
Copy Markdown
Contributor

This draft PR is stale because it has been open 21 days and the author has not commented since opening. It is flagged for removal. Remove the stale label or comment on the PR or this will be closed in one day.

@github-actions

Copy link
Copy Markdown
Contributor

This draft PR was closed because it has been stalled for 22 days with no author comment since opening. You can reopen this PR later if it is still necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant