Skip to content

feat: add dynamic model capability detection#16

Closed
jerome-benoit wants to merge 53 commits into
mainfrom
feat/dynamic-model-capabilities
Closed

feat: add dynamic model capability detection#16
jerome-benoit wants to merge 53 commits into
mainfrom
feat/dynamic-model-capabilities

Conversation

@jerome-benoit
Copy link
Copy Markdown
Owner

Summary

Implement vendor-specific capability detection for SAP AI Core models, following the pattern from @ai-sdk/openai.

Changes

Core Implementation

  • Add getSAPAIModelCapabilities() function to detect capabilities based on model vendor and type
  • Convert static capability flags to dynamic getters in SAPAILanguageModel
  • Add capability caching for performance

Vendor Support

  • Amazon/Anthropic: supportsN: false (no multiple completions)
  • Meta: supportsStructuredOutputs: false
  • Azure/Google/Mistral: Full capabilities

Model-Specific Overrides

  • Claude 2.x: Limited parallel tools and structured outputs
  • Amazon Titan: Limited to streaming only
  • Llama 2/3.x: No structured outputs
  • Gemini 1.0: Limited capabilities
  • Mistral Small/Tiny: No structured outputs

API Changes

  • Export getSAPAIModelCapabilities(), getModelVendor(), modelSupports()
  • Export SAPAIModelCapabilities and SAPAIModelVendor types

Testing

  • Added comprehensive tests for all vendor capabilities
  • Added tests for model-specific overrides
  • Updated existing tests to verify dynamic behavior

Documentation

  • Updated API reference with capability details
  • Added vendor capability summary

Breaking Changes

None - backward compatible (capability flags still exist as getters)

Copilot AI review requested due to automatic review settings January 21, 2026 12:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements dynamic model capability detection for SAP AI Core models, following patterns from @ai-sdk/openai. The implementation determines model capabilities based on vendor prefixes and model-specific patterns, allowing the SDK to correctly handle differences between various AI models (e.g., Amazon/Anthropic models don't support the n parameter, Meta models don't support structured outputs).

Changes:

  • Added new sap-ai-model-capabilities.ts module with vendor and model-specific capability detection logic
  • Converted static capability flags in SAPAILanguageModel to dynamic getters backed by capability detection
  • Added comprehensive test coverage for all vendors and model-specific overrides
  • Updated API documentation with detailed capability information and vendor summary table

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/sap-ai-model-capabilities.ts New module implementing capability detection with vendor and model-specific patterns
src/sap-ai-model-capabilities.test.ts Comprehensive test suite covering all vendors and model overrides
src/sap-ai-language-model.ts Converted static capability properties to dynamic getters with caching
src/sap-ai-language-model.test.ts Updated tests to verify dynamic capability behavior for various model types
src/index.ts Added exports for capability detection functions and types
API_REFERENCE.md Added documentation for capability detection API and vendor summary table

Comment thread src/sap-ai-model-capabilities.test.ts
Comment thread src/sap-ai-model-capabilities.ts Outdated
Comment thread src/sap-ai-model-capabilities.ts
Comment thread src/sap-ai-model-capabilities.test.ts
Comment thread src/sap-ai-model-capabilities.ts Outdated
Comment thread src/sap-ai-model-capabilities.ts
Comment thread src/sap-ai-model-capabilities.ts Outdated
Comment thread src/sap-ai-model-capabilities.ts Outdated
Implement vendor-specific capability detection for SAP AI Core models following the pattern from @ai-sdk/openai.

Changes:
- Add getSAPAIModelCapabilities() function to detect capabilities based on model vendor
- Convert static capability flags to dynamic getters in SAPAILanguageModel
- Support vendor-specific limitations (Amazon/Anthropic: no supportsN, Meta: no structured outputs)
- Add model-specific overrides for Claude 2.x, Titan, Llama, Gemini 1.0, Mistral Small/Tiny
- Export capability detection functions and types from index
- Update tests to verify vendor-specific capabilities
- Update API reference documentation

This enables proper capability detection for different model vendors and versions deployed on SAP AI Core.
@jerome-benoit jerome-benoit force-pushed the feat/dynamic-model-capabilities branch from ca70cfa to ed83b25 Compare January 22, 2026 19:18
Copilot AI review requested due to automatic review settings January 22, 2026 20:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/sap-ai-model-capabilities.ts Outdated
Resolved conflicts:
- src/sap-ai-language-model.ts: Integrated strategy pattern with dynamic capabilities
- src/sap-ai-language-model.test.ts: Updated capability tests for vendor-specific behavior

Key changes:
- Kept strategy-based architecture from main
- Added dynamic model capabilities feature from PR
- Updated tests to expect correct capabilities per model vendor
  (anthropic/amazon models have supportsMultipleCompletions: false)
Copilot AI review requested due to automatic review settings January 29, 2026 18:06
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Comment thread API_REFERENCE.md Outdated
Comment thread API_REFERENCE.md Outdated
Comment thread src/sap-ai-language-model.ts
Comment thread API_REFERENCE.md Outdated
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
Copilot AI review requested due to automatic review settings January 29, 2026 18:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment thread src/sap-ai-model-capabilities.ts Outdated
Comment thread src/sap-ai-model-capabilities.ts
Comment thread src/sap-ai-model-capabilities.test.ts
Comment thread API_REFERENCE.md Outdated
Comment thread API_REFERENCE.md Outdated
Comment thread API_REFERENCE.md Outdated
- Freeze getSAPAIModelCapabilities() return value to prevent mutation
- Make supportedUrls/supportsUrl respect supportsImageInputs capability
- Fix package name in API_REFERENCE.md (@jerome-benoit/sap-ai-provider)
- Document 'user' mode in defaultSystemMessageMode
- Remove non-existent defaultObjectGenerationMode from docs
Copilot AI review requested due to automatic review settings February 1, 2026 19:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread src/sap-ai-model-capabilities.ts
Comment thread API_REFERENCE.md
Resolve merge conflicts:
- src/sap-ai-language-model.ts: Keep dynamic capability getters from feature
  branch that delegate to getSAPAIModelCapabilities()
- API_REFERENCE.md: Keep documentation showing model-dependent boolean getters
Copilot AI review requested due to automatic review settings February 4, 2026 21:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 11, 2026 10:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +323 to +332
describe("MistralAI Instruct models", () => {
it("should have full capabilities for mistralai--mistral-small-instruct", () => {
const capabilities = getSAPAIModelCapabilities("mistralai--mistral-small-instruct");

// Note: mistral-small has limited structured outputs, but -instruct suffix
// doesn't change this - the pattern matches on mistral-small prefix
expect(capabilities.supportsStructuredOutputs).toBe(false);
expect(capabilities.supportsN).toBe(true);
expect(capabilities.vendor).toBe("mistralai");
});
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name says this model has "full capabilities", but the assertions below expect supportsStructuredOutputs to be false (limited capabilities). Rename the test description to match the behavior being asserted to avoid confusion when reading failures.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 16, 2026 13:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +55 to +58
it("should return 'unknown' for unrecognized vendor prefixes", () => {
expect(getModelVendor("foobar--some-model")).toBe("unknown");
expect(getModelVendor("openai--gpt-4")).toBe("unknown");
});
Comment thread src/index.ts
Comment on lines +35 to +47
/**
* Dynamic model capability detection for SAP AI Core models.
*
* Functions for determining model capabilities based on model ID prefix
* following the SAP AI Core naming convention: `vendor--model-name`.
*/
export {
getModelVendor,
getSAPAIModelCapabilities,
modelSupports,
} from "./sap-ai-model-capabilities.js";

export type { SAPAIModelCapabilities, SAPAIModelVendor } from "./sap-ai-model-capabilities.js";
Copilot AI review requested due to automatic review settings March 18, 2026 09:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +188 to +199
export function getModelVendor(modelId: string): "unknown" | SAPAIModelVendor {
const vendorMatch = /^([a-z]+)--/.exec(modelId.toLowerCase());
if (!vendorMatch) {
return "unknown";
}

const vendor = vendorMatch[1] as SAPAIModelVendor;
if (vendor in VENDOR_CAPABILITIES) {
return vendor;
}

return "unknown";
Comment on lines +85 to +104
/**
* Module-level cache for model capabilities to avoid repeated computation.
* @internal
*/
const capabilitiesCache = new Map<string, SAPAIModelCapabilities>();

/**
* Default capabilities for models without specific overrides.
* @internal
*/
const DEFAULT_CAPABILITIES: SAPAIModelCapabilities = {
defaultSystemMessageMode: "system",
supportsImageInputs: true,
supportsN: true,
supportsParallelToolCalls: true,
supportsStreaming: true,
supportsStructuredOutputs: true,
supportsToolCalls: true,
vendor: "unknown",
};
Copilot AI review requested due to automatic review settings March 21, 2026 21:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 22, 2026 14:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +85 to +104
/**
* Module-level cache for model capabilities to avoid repeated computation.
* @internal
*/
const capabilitiesCache = new Map<string, SAPAIModelCapabilities>();

/**
* Default capabilities for models without specific overrides.
* @internal
*/
const DEFAULT_CAPABILITIES: SAPAIModelCapabilities = {
defaultSystemMessageMode: "system",
supportsImageInputs: true,
supportsN: true,
supportsParallelToolCalls: true,
supportsStreaming: true,
supportsStructuredOutputs: true,
supportsToolCalls: true,
vendor: "unknown",
};
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capabilitiesCache is a module-level Map keyed by (normalized) modelId. Since modelId ultimately comes from user input, this cache can grow without bound (e.g., many unique/garbage model IDs), causing a memory leak / potential DoS in long-running processes. Consider either limiting the cache size (LRU), caching only recognized vendor/pattern matches, or removing the global cache and relying on per-model-instance caching (already implemented in SAPAILanguageModel).

Copilot uses AI. Check for mistakes.
return {};
}
return {
"image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/],
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportedUrls uses /^data:image\/.*$/ (case-sensitive) while supportsUrl() checks data URLs with a case-insensitive regex. This can lead to inconsistent behavior depending on which API a consumer relies on (e.g., Data:Image/... accepted by supportsUrl but rejected by supportedUrls matching). Align the regex flags so both paths behave the same.

Suggested change
"image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/],
"image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/i],

Copilot uses AI. Check for mistakes.
jerome-benoit added a commit that referenced this pull request May 5, 2026
- types.ts: parseFindingsSafe with partial recovery (finding #18)
- concurrency-pool.ts: guard NaN/float/Infinity in constructor (#1,#4), JSDoc re-entrant warning (#2)
- task-source.ts: NFKC normalize + extended blocklist in sanitizeForPrompt (#17,#20), Map lookup in validatePlan (#7)
- refinement-loop.ts: readFileSync replaces execSync/sed (shell injection #5), radius used (#12), fallback hash hardened (#13), line clamping (#14), realpathSync symlinks (#15), file cache Map (#11), parseFindingsSafe (#18), nonce validation (#16), trivial match skip (#19), implementer try/catch (#7), ratchet guard round<=2 (#6), no duplicate nonLowFindings (#9)
- finalizer.ts: pushBranch returns boolean (#22), crypto rescue suffix (#21), rollback error level (#8)
- main.ts: TASK_TIMEOUT_MS + Promise.race timeout (#3), flat budget JSDoc (#10)
jerome-benoit added a commit that referenced this pull request May 5, 2026
* feat: implement sandcastle refinement loop with critic-based convergence

Replace single-pass implement→review→merge with iterative implement↔critic
loop per task. Key changes:

- Orchestrator fetches and sanitizes issues (prevents prompt injection)
- Implement↔Critic loop with deterministic dedup convergence
- Critic produces structured findings (nonce-tagged JSON, zod-validated)
- Decreasing iteration budget per round [100, 50, 25, 10, 10]
- Host-side validation and rebase (no agent needed)
- One PR per task (no merger agent)
- Draft PR on non-convergence with outstanding findings listed

Implements #110

* fix: address review findings (shell injection, false convergence, nesting, zod validation)

- Replace execSync with execFileSync for gh pr create (prevents shell injection)
- Guard parseFindings against empty matches (prevents false convergence)
- Add try/catch on gh issue list startup call
- Guard git push in rebase catch block
- Extract finalizeIssue function (reduces nesting from 6+ to 3 levels)
- Add zod schema for rawIssues (replaces unsafe 'as' cast)
- Implement validation retry round per spec (one more implement→critic if budget remains)

* fix: guard retry calls, split rebase logic, remove dead critic retry

* fix: handle nullable issue body and guard JSON parse

* fix: distinguish stalled from converged (re-reported findings → draft PR)

* fix: LOW-only findings should not prevent convergence

* fix: log validation errors, conditional checklist, derive PR title from labels

* refactor: extract sandcastle into modular architecture

Split main.ts (525 lines) into 6 self-contained modules:
- types.ts: shared domain types (TaskSpec, Finding, LoopResult, FinalizeResult)
- refinement-loop.ts: reusable implement↔critic loop engine
- finalizer.ts: validation, rebase, PR creation
- concurrency-pool.ts: semaphore utility
- task-source.ts: TaskSource interface + GithubIssueSource
- main.ts: 74-line thin orchestrator wiring all modules

The refinement loop is now reusable by any task source (GitHub issues,
CI failures, manual triggers) without coupling to the planner.

* fix: centralize constants, fix PR type-of-change, sanitize titles

* fix: full validation post-rebase, execFileSync for gh issue list

* perf: skip critic when implementer produces 0 commits on round 2+

* fix: remove unsound type guard, filter unknown plan IDs, guard ConcurrencyPool, warn on planner exhaustion

* feat: state-of-the-art algorithmic improvements

- Budget: flat constant 50/round (ARCS/SWE-Agent pattern, replaces suboptimal decreasing schedule)
- Dedup: context-hash fingerprint of ±3 lines (CodeQL/Qodana pattern, drift-safe)
- Quality ratchet: rollback round on regression (findings increase → git reset --hard)
- Log: post-rebase validation failure now logs stderr
- Safety: rescue branch on push failure (preserves commits before sandbox disposal)

* fix(sandcastle): resolve all algorithmic audit findings

- Replace execSync+sed shell injection with readFileSync+join in hashContextLines (RCE, P0)
- Fix quality ratchet to compare nonLowFindings.length not findings.length (logic bug, P1)
- Replace title truncation with SHA-256 hash in findingKey (collision fix, P2)
- Hash single target line + file salt instead of positional context window (line-drift fix, P2)
- Replace overlapping regex quantifiers in sanitizeForPrompt (ReDoS O(n²), P2)
- Replace Array FIFO queue with O(1) singly-linked list in ConcurrencyPool (P3)
- Move totalCommits accumulation after ratchet check to fix phantom counting (P3)
- Replace O(T×I) .find() with O(1) Map lookup in validatePlan (P3)

* fix(sandcastle): harden subprocess calls and reduce cyclomatic complexity

- Replace all execSync template literals with execFileSync (array args)
  to eliminate shell injection vectors from LLM-controlled data
- Strengthen path traversal guard: resolve() + sep instead of join()
- Validate SHA format (/^[0-9a-f]{40}$/) before git reset --hard
- Decompose finalizeTask (CC 15→5) into runValidation, attemptRebase,
  pushBranch, buildPrArgs, extractStderr
- Decompose runRefinementLoop (CC 18→8) into executeRound,
  deduplicateFindings, checkQualityRatchet
- Replace process.exit(0) with natural script termination
- Add title validation (length ≤200, no control chars) in validatePlan
- Add console.warn in previously silent catch blocks

* fix(sandcastle): address review findings — convergence, planner failure, rebase note

- Treat newFindings=0 as 'converged' regardless of repeated non-LOW
  findings (previously marked 'exhausted', forcing draft PRs)
- Set process.exitCode=1 when planner exhausts retries (prevents
  masking a broken planner as 'no work to do')
- Pass rebaseSucceeded to buildPrArgs, append rebase failure note
  to PR body when branch is not rebased onto main
- Reword sanitizeForPrompt JSDoc to accurately describe scope

* fix(.sandcastle): address all algorithmic audit findings

- types.ts: parseFindingsSafe with partial recovery (finding #18)
- concurrency-pool.ts: guard NaN/float/Infinity in constructor (#1,#4), JSDoc re-entrant warning (#2)
- task-source.ts: NFKC normalize + extended blocklist in sanitizeForPrompt (#17,#20), Map lookup in validatePlan (#7)
- refinement-loop.ts: readFileSync replaces execSync/sed (shell injection #5), radius used (#12), fallback hash hardened (#13), line clamping (#14), realpathSync symlinks (#15), file cache Map (#11), parseFindingsSafe (#18), nonce validation (#16), trivial match skip (#19), implementer try/catch (#7), ratchet guard round<=2 (#6), no duplicate nonLowFindings (#9)
- finalizer.ts: pushBranch returns boolean (#22), crypto rescue suffix (#21), rollback error level (#8)
- main.ts: TASK_TIMEOUT_MS + Promise.race timeout (#3), flat budget JSDoc (#10)

* fix: unref timeout timer to prevent process hang, catch critic throws

* fix: count commits before break on critic failure (prevents work loss)

* fix: suppress unhandled rejection from timeout promise on task success

* fix: force process exit after completion (prevents hang from timed-out sandboxes)

* fix: check findings null before commits zero (correct status on implementer crash)

* fix: report known findings in PR body even when converged (prevents silent false convergence)

* feat: state-of-the-art convergence improvements (ARCS, SWE-Agent, OpenHands)

- Validation in-loop (ARCS): deterministic convergence when tests pass mid-loop
- Best-state checkpoint (SWE-Agent): reset to best SHA on non-convergence
- Severity-weighted convergence (OpenHands): refuse convergence if CRITICAL/HIGH persist

* fix: move bestSha after ratchet, add validation timeout, fix severity/bestSha mismatch

* fix: recount totalCommits from git after best-state reset (semantic correctness)

* refactor(.sandcastle): address all 45 quality audit findings

- New constants.ts: shared constants (VALIDATION_COMMAND, timeouts, model names) + utilities (getHeadSha, toErrorMessage)
- refinement-loop.ts: decompose runRefinementLoop (CC 17→≤10), RoundContext/HashInput param objects, computeFindingKey rename
- finalizer.ts: add timeouts to all execFileSync, use runValidation helper consistently
- task-source.ts: add timeout, replace char loop with regex, fix terse names
- main.ts: extract withTimeout helper, use model constants
- types.ts: unexport FindingsSchema (internal only)

* fix: use constants from constants.ts + add planner timeout (multi-agent audit findings)

* refactor(.sandcastle): harden prompts — cap findings, add known decisions, scope preference

* perf(.sandcastle): convert execFileSync to async execFileAsync (unblock event loop)

Replace all blocking execFileSync calls with util.promisify(execFile) to enable
true parallelism between tasks during subprocess execution.

- constants.ts: add execFileAsync export, convert getHeadSha to async
- refinement-loop.ts: captureHeadSha, checkQualityRatchet, checkConvergence,
  runMidLoopValidation, resetToBestState all async
- finalizer.ts: runValidation, attemptRebase, pushBranch all async
- task-source.ts: fetchAndSanitizeIssues async

readFileSync/realpathSync stay sync (<1ms local I/O, no benefit from async).
maxBuffer: 8MB added to validation and gh issue list calls.

* fix: catch planner timeout rejection (retry instead of crash) + add catch type annotations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants