feat: add dynamic model capability detection#16
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements dynamic model capability detection for SAP AI Core models, following patterns from @ai-sdk/openai. The implementation determines model capabilities based on vendor prefixes and model-specific patterns, allowing the SDK to correctly handle differences between various AI models (e.g., Amazon/Anthropic models don't support the n parameter, Meta models don't support structured outputs).
Changes:
- Added new
sap-ai-model-capabilities.tsmodule with vendor and model-specific capability detection logic - Converted static capability flags in
SAPAILanguageModelto dynamic getters backed by capability detection - Added comprehensive test coverage for all vendors and model-specific overrides
- Updated API documentation with detailed capability information and vendor summary table
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/sap-ai-model-capabilities.ts |
New module implementing capability detection with vendor and model-specific patterns |
src/sap-ai-model-capabilities.test.ts |
Comprehensive test suite covering all vendors and model overrides |
src/sap-ai-language-model.ts |
Converted static capability properties to dynamic getters with caching |
src/sap-ai-language-model.test.ts |
Updated tests to verify dynamic capability behavior for various model types |
src/index.ts |
Added exports for capability detection functions and types |
API_REFERENCE.md |
Added documentation for capability detection API and vendor summary table |
Implement vendor-specific capability detection for SAP AI Core models following the pattern from @ai-sdk/openai. Changes: - Add getSAPAIModelCapabilities() function to detect capabilities based on model vendor - Convert static capability flags to dynamic getters in SAPAILanguageModel - Support vendor-specific limitations (Amazon/Anthropic: no supportsN, Meta: no structured outputs) - Add model-specific overrides for Claude 2.x, Titan, Llama, Gemini 1.0, Mistral Small/Tiny - Export capability detection functions and types from index - Update tests to verify vendor-specific capabilities - Update API reference documentation This enables proper capability detection for different model vendors and versions deployed on SAP AI Core.
ca70cfa to
ed83b25
Compare
Resolved conflicts: - src/sap-ai-language-model.ts: Integrated strategy pattern with dynamic capabilities - src/sap-ai-language-model.test.ts: Updated capability tests for vendor-specific behavior Key changes: - Kept strategy-based architecture from main - Added dynamic model capabilities feature from PR - Updated tests to expect correct capabilities per model vendor (anthropic/amazon models have supportsMultipleCompletions: false)
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
- Freeze getSAPAIModelCapabilities() return value to prevent mutation - Make supportedUrls/supportsUrl respect supportsImageInputs capability - Fix package name in API_REFERENCE.md (@jerome-benoit/sap-ai-provider) - Document 'user' mode in defaultSystemMessageMode - Remove non-existent defaultObjectGenerationMode from docs
Resolve merge conflicts: - src/sap-ai-language-model.ts: Keep dynamic capability getters from feature branch that delegate to getSAPAIModelCapabilities() - API_REFERENCE.md: Keep documentation showing model-dependent boolean getters
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
| describe("MistralAI Instruct models", () => { | ||
| it("should have full capabilities for mistralai--mistral-small-instruct", () => { | ||
| const capabilities = getSAPAIModelCapabilities("mistralai--mistral-small-instruct"); | ||
|
|
||
| // Note: mistral-small has limited structured outputs, but -instruct suffix | ||
| // doesn't change this - the pattern matches on mistral-small prefix | ||
| expect(capabilities.supportsStructuredOutputs).toBe(false); | ||
| expect(capabilities.supportsN).toBe(true); | ||
| expect(capabilities.vendor).toBe("mistralai"); | ||
| }); |
There was a problem hiding this comment.
Test name says this model has "full capabilities", but the assertions below expect supportsStructuredOutputs to be false (limited capabilities). Rename the test description to match the behavior being asserted to avoid confusion when reading failures.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
| it("should return 'unknown' for unrecognized vendor prefixes", () => { | ||
| expect(getModelVendor("foobar--some-model")).toBe("unknown"); | ||
| expect(getModelVendor("openai--gpt-4")).toBe("unknown"); | ||
| }); |
| /** | ||
| * Dynamic model capability detection for SAP AI Core models. | ||
| * | ||
| * Functions for determining model capabilities based on model ID prefix | ||
| * following the SAP AI Core naming convention: `vendor--model-name`. | ||
| */ | ||
| export { | ||
| getModelVendor, | ||
| getSAPAIModelCapabilities, | ||
| modelSupports, | ||
| } from "./sap-ai-model-capabilities.js"; | ||
|
|
||
| export type { SAPAIModelCapabilities, SAPAIModelVendor } from "./sap-ai-model-capabilities.js"; |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
| export function getModelVendor(modelId: string): "unknown" | SAPAIModelVendor { | ||
| const vendorMatch = /^([a-z]+)--/.exec(modelId.toLowerCase()); | ||
| if (!vendorMatch) { | ||
| return "unknown"; | ||
| } | ||
|
|
||
| const vendor = vendorMatch[1] as SAPAIModelVendor; | ||
| if (vendor in VENDOR_CAPABILITIES) { | ||
| return vendor; | ||
| } | ||
|
|
||
| return "unknown"; |
| /** | ||
| * Module-level cache for model capabilities to avoid repeated computation. | ||
| * @internal | ||
| */ | ||
| const capabilitiesCache = new Map<string, SAPAIModelCapabilities>(); | ||
|
|
||
| /** | ||
| * Default capabilities for models without specific overrides. | ||
| * @internal | ||
| */ | ||
| const DEFAULT_CAPABILITIES: SAPAIModelCapabilities = { | ||
| defaultSystemMessageMode: "system", | ||
| supportsImageInputs: true, | ||
| supportsN: true, | ||
| supportsParallelToolCalls: true, | ||
| supportsStreaming: true, | ||
| supportsStructuredOutputs: true, | ||
| supportsToolCalls: true, | ||
| vendor: "unknown", | ||
| }; |
| /** | ||
| * Module-level cache for model capabilities to avoid repeated computation. | ||
| * @internal | ||
| */ | ||
| const capabilitiesCache = new Map<string, SAPAIModelCapabilities>(); | ||
|
|
||
| /** | ||
| * Default capabilities for models without specific overrides. | ||
| * @internal | ||
| */ | ||
| const DEFAULT_CAPABILITIES: SAPAIModelCapabilities = { | ||
| defaultSystemMessageMode: "system", | ||
| supportsImageInputs: true, | ||
| supportsN: true, | ||
| supportsParallelToolCalls: true, | ||
| supportsStreaming: true, | ||
| supportsStructuredOutputs: true, | ||
| supportsToolCalls: true, | ||
| vendor: "unknown", | ||
| }; |
There was a problem hiding this comment.
capabilitiesCache is a module-level Map keyed by (normalized) modelId. Since modelId ultimately comes from user input, this cache can grow without bound (e.g., many unique/garbage model IDs), causing a memory leak / potential DoS in long-running processes. Consider either limiting the cache size (LRU), caching only recognized vendor/pattern matches, or removing the global cache and relying on per-model-instance caching (already implemented in SAPAILanguageModel).
| return {}; | ||
| } | ||
| return { | ||
| "image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/], |
There was a problem hiding this comment.
supportedUrls uses /^data:image\/.*$/ (case-sensitive) while supportsUrl() checks data URLs with a case-insensitive regex. This can lead to inconsistent behavior depending on which API a consumer relies on (e.g., Data:Image/... accepted by supportsUrl but rejected by supportedUrls matching). Align the regex flags so both paths behave the same.
| "image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/], | |
| "image/*": [/^https:\/\/.+$/i, /^data:image\/.*$/i], |
- types.ts: parseFindingsSafe with partial recovery (finding #18) - concurrency-pool.ts: guard NaN/float/Infinity in constructor (#1,#4), JSDoc re-entrant warning (#2) - task-source.ts: NFKC normalize + extended blocklist in sanitizeForPrompt (#17,#20), Map lookup in validatePlan (#7) - refinement-loop.ts: readFileSync replaces execSync/sed (shell injection #5), radius used (#12), fallback hash hardened (#13), line clamping (#14), realpathSync symlinks (#15), file cache Map (#11), parseFindingsSafe (#18), nonce validation (#16), trivial match skip (#19), implementer try/catch (#7), ratchet guard round<=2 (#6), no duplicate nonLowFindings (#9) - finalizer.ts: pushBranch returns boolean (#22), crypto rescue suffix (#21), rollback error level (#8) - main.ts: TASK_TIMEOUT_MS + Promise.race timeout (#3), flat budget JSDoc (#10)
* feat: implement sandcastle refinement loop with critic-based convergence Replace single-pass implement→review→merge with iterative implement↔critic loop per task. Key changes: - Orchestrator fetches and sanitizes issues (prevents prompt injection) - Implement↔Critic loop with deterministic dedup convergence - Critic produces structured findings (nonce-tagged JSON, zod-validated) - Decreasing iteration budget per round [100, 50, 25, 10, 10] - Host-side validation and rebase (no agent needed) - One PR per task (no merger agent) - Draft PR on non-convergence with outstanding findings listed Implements #110 * fix: address review findings (shell injection, false convergence, nesting, zod validation) - Replace execSync with execFileSync for gh pr create (prevents shell injection) - Guard parseFindings against empty matches (prevents false convergence) - Add try/catch on gh issue list startup call - Guard git push in rebase catch block - Extract finalizeIssue function (reduces nesting from 6+ to 3 levels) - Add zod schema for rawIssues (replaces unsafe 'as' cast) - Implement validation retry round per spec (one more implement→critic if budget remains) * fix: guard retry calls, split rebase logic, remove dead critic retry * fix: handle nullable issue body and guard JSON parse * fix: distinguish stalled from converged (re-reported findings → draft PR) * fix: LOW-only findings should not prevent convergence * fix: log validation errors, conditional checklist, derive PR title from labels * refactor: extract sandcastle into modular architecture Split main.ts (525 lines) into 6 self-contained modules: - types.ts: shared domain types (TaskSpec, Finding, LoopResult, FinalizeResult) - refinement-loop.ts: reusable implement↔critic loop engine - finalizer.ts: validation, rebase, PR creation - concurrency-pool.ts: semaphore utility - task-source.ts: TaskSource interface + GithubIssueSource - main.ts: 74-line thin orchestrator wiring all modules The refinement loop is now reusable by any task source (GitHub issues, CI failures, manual triggers) without coupling to the planner. * fix: centralize constants, fix PR type-of-change, sanitize titles * fix: full validation post-rebase, execFileSync for gh issue list * perf: skip critic when implementer produces 0 commits on round 2+ * fix: remove unsound type guard, filter unknown plan IDs, guard ConcurrencyPool, warn on planner exhaustion * feat: state-of-the-art algorithmic improvements - Budget: flat constant 50/round (ARCS/SWE-Agent pattern, replaces suboptimal decreasing schedule) - Dedup: context-hash fingerprint of ±3 lines (CodeQL/Qodana pattern, drift-safe) - Quality ratchet: rollback round on regression (findings increase → git reset --hard) - Log: post-rebase validation failure now logs stderr - Safety: rescue branch on push failure (preserves commits before sandbox disposal) * fix(sandcastle): resolve all algorithmic audit findings - Replace execSync+sed shell injection with readFileSync+join in hashContextLines (RCE, P0) - Fix quality ratchet to compare nonLowFindings.length not findings.length (logic bug, P1) - Replace title truncation with SHA-256 hash in findingKey (collision fix, P2) - Hash single target line + file salt instead of positional context window (line-drift fix, P2) - Replace overlapping regex quantifiers in sanitizeForPrompt (ReDoS O(n²), P2) - Replace Array FIFO queue with O(1) singly-linked list in ConcurrencyPool (P3) - Move totalCommits accumulation after ratchet check to fix phantom counting (P3) - Replace O(T×I) .find() with O(1) Map lookup in validatePlan (P3) * fix(sandcastle): harden subprocess calls and reduce cyclomatic complexity - Replace all execSync template literals with execFileSync (array args) to eliminate shell injection vectors from LLM-controlled data - Strengthen path traversal guard: resolve() + sep instead of join() - Validate SHA format (/^[0-9a-f]{40}$/) before git reset --hard - Decompose finalizeTask (CC 15→5) into runValidation, attemptRebase, pushBranch, buildPrArgs, extractStderr - Decompose runRefinementLoop (CC 18→8) into executeRound, deduplicateFindings, checkQualityRatchet - Replace process.exit(0) with natural script termination - Add title validation (length ≤200, no control chars) in validatePlan - Add console.warn in previously silent catch blocks * fix(sandcastle): address review findings — convergence, planner failure, rebase note - Treat newFindings=0 as 'converged' regardless of repeated non-LOW findings (previously marked 'exhausted', forcing draft PRs) - Set process.exitCode=1 when planner exhausts retries (prevents masking a broken planner as 'no work to do') - Pass rebaseSucceeded to buildPrArgs, append rebase failure note to PR body when branch is not rebased onto main - Reword sanitizeForPrompt JSDoc to accurately describe scope * fix(.sandcastle): address all algorithmic audit findings - types.ts: parseFindingsSafe with partial recovery (finding #18) - concurrency-pool.ts: guard NaN/float/Infinity in constructor (#1,#4), JSDoc re-entrant warning (#2) - task-source.ts: NFKC normalize + extended blocklist in sanitizeForPrompt (#17,#20), Map lookup in validatePlan (#7) - refinement-loop.ts: readFileSync replaces execSync/sed (shell injection #5), radius used (#12), fallback hash hardened (#13), line clamping (#14), realpathSync symlinks (#15), file cache Map (#11), parseFindingsSafe (#18), nonce validation (#16), trivial match skip (#19), implementer try/catch (#7), ratchet guard round<=2 (#6), no duplicate nonLowFindings (#9) - finalizer.ts: pushBranch returns boolean (#22), crypto rescue suffix (#21), rollback error level (#8) - main.ts: TASK_TIMEOUT_MS + Promise.race timeout (#3), flat budget JSDoc (#10) * fix: unref timeout timer to prevent process hang, catch critic throws * fix: count commits before break on critic failure (prevents work loss) * fix: suppress unhandled rejection from timeout promise on task success * fix: force process exit after completion (prevents hang from timed-out sandboxes) * fix: check findings null before commits zero (correct status on implementer crash) * fix: report known findings in PR body even when converged (prevents silent false convergence) * feat: state-of-the-art convergence improvements (ARCS, SWE-Agent, OpenHands) - Validation in-loop (ARCS): deterministic convergence when tests pass mid-loop - Best-state checkpoint (SWE-Agent): reset to best SHA on non-convergence - Severity-weighted convergence (OpenHands): refuse convergence if CRITICAL/HIGH persist * fix: move bestSha after ratchet, add validation timeout, fix severity/bestSha mismatch * fix: recount totalCommits from git after best-state reset (semantic correctness) * refactor(.sandcastle): address all 45 quality audit findings - New constants.ts: shared constants (VALIDATION_COMMAND, timeouts, model names) + utilities (getHeadSha, toErrorMessage) - refinement-loop.ts: decompose runRefinementLoop (CC 17→≤10), RoundContext/HashInput param objects, computeFindingKey rename - finalizer.ts: add timeouts to all execFileSync, use runValidation helper consistently - task-source.ts: add timeout, replace char loop with regex, fix terse names - main.ts: extract withTimeout helper, use model constants - types.ts: unexport FindingsSchema (internal only) * fix: use constants from constants.ts + add planner timeout (multi-agent audit findings) * refactor(.sandcastle): harden prompts — cap findings, add known decisions, scope preference * perf(.sandcastle): convert execFileSync to async execFileAsync (unblock event loop) Replace all blocking execFileSync calls with util.promisify(execFile) to enable true parallelism between tasks during subprocess execution. - constants.ts: add execFileAsync export, convert getHeadSha to async - refinement-loop.ts: captureHeadSha, checkQualityRatchet, checkConvergence, runMidLoopValidation, resetToBestState all async - finalizer.ts: runValidation, attemptRebase, pushBranch all async - task-source.ts: fetchAndSanitizeIssues async readFileSync/realpathSync stay sync (<1ms local I/O, no benefit from async). maxBuffer: 8MB added to validation and gh issue list calls. * fix: catch planner timeout rejection (retry instead of crash) + add catch type annotations
Summary
Implement vendor-specific capability detection for SAP AI Core models, following the pattern from
@ai-sdk/openai.Changes
Core Implementation
getSAPAIModelCapabilities()function to detect capabilities based on model vendor and typeSAPAILanguageModelVendor Support
supportsN: false(no multiple completions)supportsStructuredOutputs: falseModel-Specific Overrides
API Changes
getSAPAIModelCapabilities(),getModelVendor(),modelSupports()SAPAIModelCapabilitiesandSAPAIModelVendortypesTesting
Documentation
Breaking Changes
None - backward compatible (capability flags still exist as getters)