diff --git a/.agents/scripts/commands/cross-review.md b/.agents/scripts/commands/cross-review.md index 0e4ee21233..c6f0342f9f 100644 --- a/.agents/scripts/commands/cross-review.md +++ b/.agents/scripts/commands/cross-review.md @@ -1,30 +1,21 @@ --- -description: Dispatch a prompt to multiple AI models, diff results, and optionally score via a judge model +description: Dispatch the same prompt to multiple AI models, diff results, and optionally auto-score via a judge model agent: Build+ mode: subagent --- -Run a multi-model adversarial review: dispatch the same prompt to N models in parallel, collect outputs, diff results, and optionally score via a judge model (Ouroboros-style pipeline). +Dispatch a prompt to multiple AI models in parallel, collect and diff their responses, and optionally score them via a judge model. Target: $ARGUMENTS ## Instructions -Parse the arguments to extract: -- `--prompt`: the review prompt (required) -- `--models`: comma-separated model tiers (default: `sonnet,opus`) -- `--score`: enable judge scoring pipeline (optional flag) -- `--judge`: judge model tier (default: `opus`) -- `--task-type`: scoring category — `code`, `review`, `analysis`, `text`, `general` (default: `general`) -- `--timeout`: per-model timeout in seconds (default: 600) -- `--output`: output directory (default: auto-generated tmp dir) - 1. Parse the user's arguments. Common forms: ```bash /cross-review "review this PR diff" --models sonnet,opus /cross-review "audit this code" --models sonnet,gemini-pro,gpt-4.1 --score - /cross-review "design this API" --score --judge opus --task-type analysis + /cross-review "design this API" --score --judge opus ``` 2. Run the cross-review: @@ -41,11 +32,11 @@ Parse the arguments to extract: --models "sonnet,gemini-pro,gpt-4.1" \ --score - # With custom judge model and task type + # With custom judge model ~/.aidevops/agents/scripts/compare-models-helper.sh cross-review \ --prompt "your prompt here" \ --models "sonnet,opus" \ - --score --judge sonnet --task-type review + --score --judge sonnet ``` 3. Present the results: @@ -55,18 +46,16 @@ Parse the arguments to extract: - Note any models that failed to respond 4. If `--score` was used, scores are automatically: - - Recorded in the model-comparisons SQLite DB (`~/.aidevops/.agent-workspace/memory/model-comparisons.db`) - - Fed into the pattern tracker for data-driven model routing (`/route`, `/patterns`) + - Recorded in the model-comparisons SQLite DB + - Fed into the pattern tracker for model routing (`/route`, `/patterns`) ## Options | Option | Default | Description | |--------|---------|-------------| -| `--prompt` | (required) | The review prompt | | `--models` | `sonnet,opus` | Comma-separated model tiers to compare | | `--score` | off | Auto-score outputs via judge model | | `--judge` | `opus` | Judge model tier (used with `--score`) | -| `--task-type` | `general` | Scoring category: `code`, `review`, `analysis`, `text`, `general` | | `--timeout` | `600` | Seconds per model | | `--output` | auto | Directory for raw outputs | | `--workdir` | `pwd` | Working directory for model context | @@ -77,14 +66,13 @@ Parse the arguments to extract: ## Scoring Criteria (judge model, 1-10 scale) -| Criterion | Scale | Description | -|-----------|-------|-------------| -| Correctness | 1-10 | Factual accuracy and technical correctness | -| Completeness | 1-10 | Coverage of all requirements and edge cases | -| Quality | 1-10 | Code quality / writing quality | -| Clarity | 1-10 | Clear explanation, good formatting, readability | -| Adherence | 1-10 | Following instructions precisely, staying on-task | -| Overall | 1-10 | Judge's holistic assessment | +| Criterion | Description | +|-----------|-------------| +| correctness | Factual accuracy and technical correctness | +| completeness | Coverage of all requirements and edge cases | +| quality | Code quality, best practices, maintainability | +| clarity | Clear explanation, good formatting, readability | +| adherence | Following the original prompt instructions precisely | ## Examples @@ -96,9 +84,6 @@ Parse the arguments to extract: /cross-review "Design a rate limiting strategy for a REST API" \ --models sonnet,opus,pro --score -# Custom judge model and task type -/cross-review "Audit this architecture" --models "sonnet,opus" --score --judge opus --task-type analysis - # Quick diff with custom timeout /cross-review "Summarize the key changes in this diff" --models haiku,sonnet --timeout 120 @@ -106,15 +91,6 @@ Parse the arguments to extract: /score-responses --leaderboard ``` -## Output - -- Per-model responses displayed inline -- Diff summary (word counts, unified diff for 2-model comparisons) -- Judge scores table (when `--score` is set) -- Winner declaration with reasoning -- Results saved to `~/.aidevops/.agent-workspace/tmp/cross-review-/` -- Judge JSON saved to `/judge-scores.json` - ## Related - `/compare-models` — Compare model capabilities and pricing (no live dispatch) diff --git a/.agents/scripts/compare-models-helper.sh b/.agents/scripts/compare-models-helper.sh index 0c4b2bf3ba..a72eade551 100755 --- a/.agents/scripts/compare-models-helper.sh +++ b/.agents/scripts/compare-models-helper.sh @@ -15,7 +15,7 @@ # capabilities Show capability matrix # providers List supported providers # discover Detect available providers and models from local config -# cross-review Dispatch same prompt to multiple models, diff results, optionally score via judge (t132.8, t1329) +# cross-review Dispatch same prompt to multiple models, diff results (t132.8) # help Show this help # # Author: AI DevOps Framework @@ -30,7 +30,7 @@ set -euo pipefail # Pattern Tracker Integration (t1098) # ============================================================================= # Reads live success/failure data from the pattern tracker memory DB. -# Same DB as pattern-tracker-helper.sh (archived) — no duplication of storage. +# Same DB as pattern-tracker-helper.sh — no duplication of storage. readonly PATTERN_DB="${AIDEVOPS_MEMORY_DIR:-$HOME/.aidevops/.agent-workspace/memory}/memory.db" readonly -a PATTERN_VALID_MODELS=(haiku flash sonnet pro opus) @@ -496,13 +496,13 @@ cmd_recommend() { echo "" echo "Pattern Tracker Insights:" local pattern_lines - pattern_lines=$(get_all_tier_patterns) + pattern_lines=$(get_all_tier_patterns "") if [[ -n "$pattern_lines" ]]; then while IFS='|' read -r ptier prate psample; do printf " %-10s %d%% success (n=%d)\n" "$ptier:" "$prate" "$psample" done <<<"$pattern_lines" else - echo " (no model-tagged patterns — record with /remember)" + echo " (no model-tagged patterns — record with pattern-tracker-helper.sh)" fi fi @@ -643,208 +643,281 @@ cmd_providers() { } # ============================================================================= -# Cross-Model Review (t132.8, t1329) +# Cross-Model Review (t132.8) # Dispatch the same review prompt to multiple models, collect results, diff. -# Optional --score flag dispatches outputs to a judge model for structured scoring. # ============================================================================= -####################################### -# Resolve Anthropic API auth header (API key or OAuth) -# Outputs: header string on stdout (e.g. "x-api-key: sk-...") -# Returns: 0 on success, 1 if no auth available -####################################### -_resolve_cross_review_auth() { - local auth_file="${HOME}/.local/share/opencode/auth.json" +# Judge scoring for cross-review (t1329) +# Dispatches all model outputs to a judge model, parses structured JSON scores, +# records results via cmd_score, and feeds into the pattern tracker. +# Defined before cmd_cross_review (its caller) for readability. +# +# Args: +# $1 - original prompt +# $2 - models_str (comma-separated) +# $3 - output_dir +# $4 - judge_model tier +# $5+ - model_names array +_cross_review_judge_score() { + local original_prompt="$1" + local models_str="$2" + local output_dir="$3" + local judge_model="$4" + shift 4 + local -a model_names=("$@") - # Priority 1: environment variable - if [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then - echo "x-api-key: ${ANTHROPIC_API_KEY}" - return 0 + # Validate judge_model identifier (used in filenames and runner names) + if [[ ! "$judge_model" =~ ^[A-Za-z0-9._-]+$ ]]; then + print_error "Invalid judge model identifier: $judge_model" + return 1 fi - # Priority 2: OAuth token from OpenCode auth.json - if [[ -f "$auth_file" ]] && command -v jq &>/dev/null; then - local auth_type - auth_type=$(jq -r '.anthropic.type // empty' "$auth_file" 2>/dev/null) - - if [[ "$auth_type" == "oauth" ]]; then - local access_token expires_at now_ms - access_token=$(jq -r '.anthropic.access // empty' "$auth_file" 2>/dev/null) - expires_at=$(jq -r '.anthropic.expires // 0' "$auth_file" 2>/dev/null) - now_ms=$(($(date +%s) * 1000)) - if [[ -n "$access_token" && "$expires_at" -gt "$now_ms" ]]; then - echo "Authorization: Bearer ${access_token}" - return 0 - fi - elif [[ "$auth_type" == "api" ]]; then - local api_key - api_key=$(jq -r '.anthropic.key // empty' "$auth_file" 2>/dev/null) - if [[ -n "$api_key" ]]; then - echo "x-api-key: ${api_key}" - return 0 - fi - fi + local runner_helper="${SCRIPT_DIR}/runner-helper.sh" + if [[ ! -x "$runner_helper" ]]; then + print_warning "runner-helper.sh not found — skipping judge scoring" + return 0 fi - return 1 -} - -####################################### -# Judge cross-review outputs via a judge model (t1329) -# Reads model output files from output_dir, calls judge model API, -# returns structured JSON scores to stdout. -# Usage: _judge_cross_review -# Returns: 0 on success (JSON on stdout), 1 on failure -####################################### -_judge_cross_review() { - local output_dir="$1" - local models_csv="$2" - local original_prompt="$3" - local judge_model="$4" - local task_type="${5:-general}" - - # Require curl and jq - if ! command -v curl &>/dev/null || ! command -v jq &>/dev/null; then - print_error "Judge scoring requires curl and jq" - return 1 - fi + echo "=== JUDGE SCORING (${judge_model}) ===" + echo "" - # Resolve auth - local auth_header - auth_header=$(_resolve_cross_review_auth) || { - print_error "Judge scoring requires Anthropic API key (ANTHROPIC_API_KEY or OpenCode OAuth)" - return 1 - } + # Build judge prompt: include original prompt + all model responses + local judge_prompt + judge_prompt="You are a neutral judge evaluating AI model responses. Score each response on a 1-10 scale. - # Build model outputs section for the judge prompt - local outputs_text="" - local -a model_array=() - IFS=',' read -ra model_array <<<"$models_csv" +ORIGINAL PROMPT: +${original_prompt} - for model_tier in "${model_array[@]}"; do - model_tier=$(echo "$model_tier" | tr -d ' ') +MODEL RESPONSES: +" + # Bound per-model response size to keep judge payload within token limits + local max_chars_per_model=20000 + local models_with_output=() + for model_tier in "${model_names[@]}"; do local result_file="${output_dir}/${model_tier}.txt" if [[ -f "$result_file" && -s "$result_file" ]]; then local response_text - response_text=$(cat "$result_file") - outputs_text="${outputs_text} - + response_text=$(head -c "$max_chars_per_model" "$result_file") + local file_size + file_size=$(wc -c <"$result_file" | tr -d ' ') + local truncated_marker="" + if [[ "$file_size" -gt "$max_chars_per_model" ]]; then + truncated_marker=" +[TRUNCATED — original ${file_size} chars, showing first ${max_chars_per_model}]" + fi + judge_prompt+=" === MODEL: ${model_tier} === -${response_text}" +${response_text}${truncated_marker} +" + models_with_output+=("$model_tier") fi done - if [[ -z "$outputs_text" ]]; then - print_error "No model outputs found to judge in $output_dir" - return 1 + if [[ ${#models_with_output[@]} -lt 2 ]]; then + print_warning "Not enough model outputs for judge scoring (need 2+)" + return 0 fi - # Build judge system prompt - local system_prompt - system_prompt='You are an expert AI model evaluator. You will be given a prompt and responses from multiple AI models. Score each model on five criteria (1-10 scale) and declare a winner. - -Scoring criteria: -- correctness (1-10): Factual accuracy and technical correctness -- completeness (1-10): Coverage of all requirements and edge cases -- quality (1-10): Code quality, best practices, structure (or writing quality for non-code) -- clarity (1-10): Clear explanation, good formatting, readability -- adherence (1-10): Following instructions precisely, staying on-task + judge_prompt+=" +SCORING INSTRUCTIONS: +Score each model on these criteria (1-10 scale): +- correctness: Factual accuracy and technical correctness +- completeness: Coverage of all requirements and edge cases +- quality: Code quality, best practices, maintainability +- clarity: Clear explanation, good formatting, readability +- adherence: Following the original prompt instructions precisely Respond with ONLY a valid JSON object in this exact format (no markdown, no explanation): { - "scores": { - "": { - "correctness": <1-10>, - "completeness": <1-10>, - "quality": <1-10>, - "clarity": <1-10>, - "adherence": <1-10>, - "overall": <1-10>, - "strengths": "", - "weaknesses": "" + \"task_type\": \"general\", + \"winner\": \"\", + \"reasoning\": \"\", + \"scores\": { + \"\": { + \"correctness\": <1-10>, + \"completeness\": <1-10>, + \"quality\": <1-10>, + \"clarity\": <1-10>, + \"adherence\": <1-10> } - }, - "winner": "", - "winner_reasoning": "<1-2 sentence rationale for winner>", - "task_type": "" -}' - - local user_prompt - user_prompt="ORIGINAL PROMPT: -${original_prompt} + } +}" -MODEL RESPONSES: -${outputs_text} + # Sanitize judge_model before using in filenames/runner names + if [[ ! "$judge_model" =~ ^[A-Za-z0-9._-]+$ ]]; then + print_warning "Invalid judge model identifier: $judge_model — skipping scoring" + return 0 + fi -Score each model and declare a winner." + # Dispatch to judge model + local judge_runner="cross-review-judge-$$" + local judge_output_file="${output_dir}/judge-${judge_model}.json" - # Resolve judge model to full model ID - local judge_model_id - judge_model_id=$(resolve_model_tier "$judge_model" 2>/dev/null || echo "$judge_model") + echo " Dispatching to judge (${judge_model})..." - # Anthropic Messages API expects raw model IDs without provider prefix - if [[ "$judge_model_id" == anthropic/* ]]; then - judge_model_id="${judge_model_id#anthropic/}" - elif [[ "$judge_model_id" == */* ]]; then - print_error "--judge must resolve to an Anthropic model when using Anthropic judge API (got: $judge_model_id)" - return 1 - fi + local judge_err_log="${output_dir}/judge-errors.log" + + "$runner_helper" create "$judge_runner" \ + --model "$judge_model" \ + --description "Cross-review judge" \ + --workdir "$(pwd)" 2>>"$judge_err_log" || true + + "$runner_helper" run "$judge_runner" "$judge_prompt" \ + --model "$judge_model" \ + --timeout "120" \ + --format text >"$judge_output_file" 2>>"$judge_err_log" || true + + "$runner_helper" destroy "$judge_runner" --force 2>>"$judge_err_log" || true - # Build API request body - local request_body - request_body=$(jq -n \ - --arg model "$judge_model_id" \ - --arg system "$system_prompt" \ - --arg user "$user_prompt" \ - '{model: $model, max_tokens: 1024, system: $system, messages: [{role: "user", content: $user}]}') - - # Build curl config via process substitution to avoid exposing auth in ps - local header_name - header_name="${auth_header%%:*}" - local curl_config - curl_config="header = \"Content-Type: application/json\" -header = \"anthropic-version: 2023-06-01\" -header = \"${auth_header}\"" - if [[ "$header_name" == "Authorization" ]]; then - curl_config="${curl_config} -header = \"anthropic-beta: oauth-2025-04-20\"" + if [[ ! -f "$judge_output_file" || ! -s "$judge_output_file" ]]; then + print_warning "Judge model returned no output — skipping scoring" + return 0 fi - # Call judge model API (headers via --config to keep secrets out of process list) - local response - response=$(curl -s --max-time 120 \ - --config <(printf '%s\n' "$curl_config") \ - -d "$request_body" \ - "https://api.anthropic.com/v1/messages") || { - print_error "Judge API call failed (curl error)" - return 1 - } + # Extract JSON from judge output (strip any surrounding text) + local judge_json + judge_json=$(grep -o '{.*}' "$judge_output_file" 2>>"$judge_err_log" | head -1 || true) + if [[ -z "$judge_json" ]]; then + # Try multiline JSON extraction via stdin (safe for paths with special chars) + judge_json=$(python3 -c " +import sys, json, re +text = sys.stdin.read() +m = re.search(r'\{.*\}', text, re.DOTALL) +if m: + try: + obj = json.loads(m.group()) + print(json.dumps(obj)) + except Exception: + pass +" <"$judge_output_file" 2>>"$judge_err_log" || true) + fi - # Extract text content from response - local ai_text - ai_text=$(echo "$response" | jq -r '.content[]? | select(.type == "text") | .text') + if [[ -z "$judge_json" ]]; then + print_warning "Could not parse judge JSON output. Raw output saved to: $judge_output_file" + return 0 + fi - if [[ -z "$ai_text" ]]; then - local api_error - api_error=$(echo "$response" | jq -r '.error.message // empty') - print_error "Judge API returned no content (error: ${api_error:-unknown})" - return 1 + # Parse winner, task_type, and reasoning in a single Python call + local parsed_fields + parsed_fields=$(echo "$judge_json" | python3 -c " +import sys, json +d = json.load(sys.stdin) +# Truncate reasoning to 500 chars and strip control characters +r = d.get('reasoning', '')[:500] +r = ''.join(c for c in r if c.isprintable() or c in (' ', '\t')) +print(d.get('winner', '')) +print(d.get('task_type', 'general')) +print(r) +" 2>>"$judge_err_log" || true) + + local winner task_type reasoning + if [[ -n "$parsed_fields" ]]; then + winner=$(echo "$parsed_fields" | head -1) + task_type=$(echo "$parsed_fields" | sed -n '2p') + reasoning=$(echo "$parsed_fields" | sed -n '3p') + else + winner="" + task_type="general" + reasoning="" fi - # Strip markdown code fences if present — extract full JSON object without truncation - local clean_json - clean_json=$(echo "$ai_text" | sed -n '/^{/,/^}/p') - if [[ -z "$clean_json" ]]; then - clean_json="$ai_text" + # Sanitize task_type: restrict to known allowlist + local -a valid_task_types=(general code review analysis debug refactor test docs security) + local task_type_valid=false + for vt in "${valid_task_types[@]}"; do + if [[ "$task_type" == "$vt" ]]; then + task_type_valid=true + break + fi + done + if [[ "$task_type_valid" != "true" ]]; then + task_type="general" fi - # Validate it's parseable JSON - if ! echo "$clean_json" | jq . &>/dev/null; then - print_error "Judge returned invalid JSON" - return 1 + # Sanitize winner: must be one of the models with output + local winner_valid=false + if [[ -n "$winner" ]]; then + for m in "${models_with_output[@]}"; do + if [[ "$winner" == "$m" ]]; then + winner_valid=true + break + fi + done + if [[ "$winner_valid" != "true" ]]; then + print_warning "Judge returned unknown winner '${winner}' — ignoring" + winner="" + fi fi - echo "$clean_json" + echo " Judge winner: ${winner:-unknown}" + [[ -n "$reasoning" ]] && echo " Reasoning: ${reasoning}" + echo "" + + # Helper: clamp a numeric value to integer in range 0-10 + # Handles decimals correctly (e.g. 8.5 → 9 via rounding, not 85) + _clamp_score() { + local val="$1" + # Accept only valid numeric format (digits with optional single decimal point) + if [[ ! "$val" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then + echo "0" + return 0 + fi + # Round to nearest integer and clamp to 0-10 + local int_val + int_val=$(printf '%.0f' "$val" 2>/dev/null || echo "0") + if [[ "$int_val" -gt 10 ]]; then + echo "10" + elif [[ "$int_val" -lt 0 ]]; then + echo "0" + else + echo "$int_val" + fi + return 0 + } + + # Build cmd_score arguments from judge JSON + local -a score_args=( + --task "$original_prompt" + --type "$task_type" + --evaluator "$judge_model" + ) + [[ -n "$winner" ]] && score_args+=(--winner "$winner") + + for model_tier in "${models_with_output[@]}"; do + # Extract all scores in a single Python call (avoids 5 subprocesses per model) + local scores_line + scores_line=$(echo "$judge_json" | python3 -c " +import sys, json +d = json.load(sys.stdin) +s = d.get('scores', {}).get('${model_tier}', {}) +print(s.get('correctness', 0), s.get('completeness', 0), s.get('quality', 0), s.get('clarity', 0), s.get('adherence', 0)) +" 2>>"$judge_err_log" || echo "0 0 0 0 0") + local corr comp qual clar adhr + read -r corr comp qual clar adhr <<<"$scores_line" + + # Clamp all scores to valid integer range 0-10 + corr=$(_clamp_score "$corr") + comp=$(_clamp_score "$comp") + qual=$(_clamp_score "$qual") + clar=$(_clamp_score "$clar") + adhr=$(_clamp_score "$adhr") + + score_args+=( + --model "$model_tier" + --correctness "$corr" + --completeness "$comp" + --quality "$qual" + --clarity "$clar" + --adherence "$adhr" + ) + done + + # Record scores via cmd_score (also syncs to pattern tracker) + cmd_score "${score_args[@]}" + + echo "Judge scores recorded. Judge output: $judge_output_file" + echo "" + return 0 } @@ -852,13 +925,14 @@ header = \"anthropic-beta: oauth-2025-04-20\"" # Cross-model review: dispatch same prompt to multiple models (t132.8, t1329) # Usage: compare-models-helper.sh cross-review --prompt "review this code" \ # --models "sonnet,opus,pro" [--workdir path] [--timeout N] [--output dir] -# [--score] [--judge opus] [--task-type code] +# [--score] [--judge ] # Dispatches via runner-helper.sh in parallel, collects outputs, produces summary. -# With --score: dispatches outputs to judge model for structured scoring and DB recording. +# With --score: feeds outputs to a judge model (default: opus) for structured scoring +# and records results in the model-comparisons DB + pattern tracker. ####################################### cmd_cross_review() { local prompt="" models_str="" workdir="" review_timeout="600" output_dir="" - local do_score=0 judge_model="opus" task_type="general" + local score_flag=false judge_model="opus" while [[ $# -gt 0 ]]; do case "$1" in @@ -903,7 +977,7 @@ cmd_cross_review() { shift 2 ;; --score) - do_score=1 + score_flag=true shift ;; --judge) @@ -912,14 +986,11 @@ cmd_cross_review() { return 1 } judge_model="$2" - shift 2 - ;; - --task-type) - [[ $# -lt 2 ]] && { - print_error "--task-type requires a value" + # Validate judge model identifier (used in filenames) + if [[ ! "$judge_model" =~ ^[A-Za-z0-9._-]+$ ]]; then + print_error "Invalid judge model identifier: $judge_model (only alphanumeric, dots, hyphens, underscores)" return 1 - } - task_type="$2" + fi shift 2 ;; *) @@ -980,9 +1051,15 @@ cmd_cross_review() { local -a model_names=() for model_tier in "${model_array[@]}"; do - model_tier=$(echo "$model_tier" | tr -d ' ') + model_tier="${model_tier// /}" [[ -z "$model_tier" ]] && continue + # Sanitize model identifier to prevent path traversal (reject chars outside safe set) + if [[ ! "$model_tier" =~ ^[A-Za-z0-9._-]+$ ]]; then + print_warning "Skipping invalid model identifier: $model_tier" + continue + fi + local runner_name="cross-review-${model_tier}-$$" runner_names+=("$runner_name") model_names+=("$model_tier") @@ -993,27 +1070,34 @@ cmd_cross_review() { echo " Dispatching to ${model_tier} (${resolved_model})..." - # Create runner, dispatch, capture output + # Create runner, dispatch, capture output (errors logged per-model for debugging) + local model_err_log="${output_dir}/${model_tier}-errors.log" ( + local model_failed=0 + "$runner_helper" create "$runner_name" \ --model "$model_tier" \ --description "Cross-review: $model_tier" \ - --workdir "$workdir" 2>/dev/null || true + --workdir "$workdir" 2>>"$model_err_log" || model_failed=1 local result_file="${output_dir}/${model_tier}.txt" "$runner_helper" run "$runner_name" "$prompt" \ --model "$model_tier" \ --timeout "$review_timeout" \ - --format json 2>/dev/null >"${output_dir}/${model_tier}.json" || true + --format json 2>>"$model_err_log" >"${output_dir}/${model_tier}.json" || model_failed=1 # Extract text response from JSON if [[ -f "${output_dir}/${model_tier}.json" ]]; then jq -r '.parts[]? | select(.type == "text") | .text' \ - "${output_dir}/${model_tier}.json" 2>/dev/null >"$result_file" || true + "${output_dir}/${model_tier}.json" 2>>"$model_err_log" >"$result_file" || model_failed=1 fi - # Clean up runner - "$runner_helper" destroy "$runner_name" --force 2>/dev/null || true + # Clean up runner (always attempt cleanup, even on failure) + "$runner_helper" destroy "$runner_name" --force 2>>"$model_err_log" || true + + # Fail if no usable output was produced + [[ -s "$result_file" ]] || model_failed=1 + exit "$model_failed" ) & pids+=($!) done @@ -1024,7 +1108,8 @@ cmd_cross_review() { local failed=0 for i in "${!pids[@]}"; do if ! wait "${pids[$i]}" 2>/dev/null; then - echo " ${model_names[$i]}: failed" + local err_log="${output_dir}/${model_names[$i]}-errors.log" + echo " ${model_names[$i]}: failed (see ${err_log})" failed=$((failed + 1)) else echo " ${model_names[$i]}: done" @@ -1079,7 +1164,14 @@ cmd_cross_review() { local file_b="${output_dir}/${model_names[1]}.txt" if [[ -f "$file_a" && -f "$file_b" ]]; then echo "Diff (${model_names[0]} vs ${model_names[1]}):" - diff --unified=3 "$file_a" "$file_b" || echo " (files are identical or diff unavailable)" + # diff exits 1 when files differ — capture separately to avoid pipefail + local diff_output diff_status + diff_output=$(diff --unified=3 "$file_a" "$file_b" 2>/dev/null) && diff_status=$? || diff_status=$? + if [[ "$diff_status" -le 1 && -n "$diff_output" ]]; then + echo "$diff_output" | head -100 + else + echo " (files are identical or diff unavailable)" + fi echo "" fi fi @@ -1087,164 +1179,11 @@ cmd_cross_review() { echo "Full results saved to: $output_dir" echo "" - # ========================================================================== - # Judge scoring pipeline (t1329) — activated by --score flag - # ========================================================================== - if [[ "$do_score" -eq 1 ]]; then - echo "=== JUDGE SCORING ===" - echo "" - echo "Judge model: ${judge_model}" - echo "Task type: ${task_type}" - echo "" - echo "Dispatching outputs to judge model..." - - local judge_json - if ! judge_json=$(_judge_cross_review "$output_dir" "$models_str" "$prompt" "$judge_model" "$task_type"); then - print_error "Judge scoring failed — raw outputs still saved to $output_dir" - return 1 - fi - - # Save judge output - local judge_file="${output_dir}/judge-scores.json" - echo "$judge_json" >"$judge_file" - - # Display structured scores - echo "" - echo "Judge Scores:" - echo "-------------" - printf "%-22s %6s %6s %6s %6s %7s\n" "Model" "Corr" "Comp" "Qual" "Clar" "Overall" - printf "%-22s %6s %6s %6s %6s %7s\n" "-----" "----" "----" "----" "----" "-------" - - local winner="" - winner=$(echo "$judge_json" | jq -r '.winner // empty') - local winner_reasoning="" - winner_reasoning=$(echo "$judge_json" | jq -r '.winner_reasoning // empty') - local judge_task_type="" - judge_task_type=$(echo "$judge_json" | jq -r '.task_type // empty') - [[ -n "$judge_task_type" ]] && task_type="$judge_task_type" - - # Build cmd_score args from judge JSON - local score_args=(--task "$prompt" --type "$task_type" --evaluator "$judge_model" --winner "$winner") - - local -a scored_models=() - while IFS= read -r model_name; do - scored_models+=("$model_name") - done < <(echo "$judge_json" | jq -r '.scores | keys[]') - - for model_name in "${scored_models[@]}"; do - # Consolidate jq calls into a single pass using --arg for safe variable passing - local m_corr m_comp m_qual m_clar m_adh m_overall m_str m_wea - read -r m_corr m_comp m_qual m_clar m_adh m_overall m_str m_wea <<<"$(echo "$judge_json" | jq -r --arg mn "$model_name" ' - .scores[$mn] | - [ - (.correctness // 0), - (.completeness // 0), - (.quality // 0), - (.clarity // 0), - (.adherence // 0), - (.overall // 0), - (.strengths // ""), - (.weaknesses // "") - ] | @tsv - ')" - - local result_file="${output_dir}/${model_name}.txt" - printf "%-22s %6s %6s %6s %6s %7s\n" \ - "$model_name" "$m_corr" "$m_comp" "$m_qual" "$m_clar" "$m_overall" - - # Accumulate args for cmd_score (1-10 scale matches model-comparisons DB) - score_args+=( - --model "$model_name" - --correctness "$m_corr" - --completeness "$m_comp" - --quality "$m_qual" - --clarity "$m_clar" - --adherence "$m_adh" - --strengths "$m_str" - --weaknesses "$m_wea" - --response "${result_file:-}" - ) - done - - echo "" - if [[ -n "$winner" ]]; then - echo " Winner: ${winner}" - fi - if [[ -n "$winner_reasoning" ]]; then - echo " Reasoning: ${winner_reasoning}" - fi - echo "" - - # Record scores in model-comparisons DB via cmd_score - echo "Recording scores in model-comparisons DB..." - if cmd_score "${score_args[@]}"; then - print_success "Scores recorded in model-comparisons DB" - else - print_warning "cmd_score recording failed (scores still in $judge_file)" - fi - - # Feed winner/loser data into pattern tracker (archived — graceful fallback) - local pt_helper="${SCRIPT_DIR}/archived/pattern-tracker-helper.sh" - if [[ -x "$pt_helper" && -n "$winner" ]]; then - echo "Syncing results to pattern tracker..." - local winner_tier loser_args=() winner_overall_score=0 - winner_tier=$(model_id_to_tier "$winner") - [[ -z "$winner_tier" ]] && winner_tier="$winner" - - for model_name in "${scored_models[@]}"; do - # Extract scores in a single jq pass using --arg for safe variable passing - local m_ove raw_corr raw_comp raw_qual raw_clar - read -r m_ove raw_corr raw_comp raw_qual raw_clar <<<"$(echo "$judge_json" | jq -r --arg mn "$model_name" ' - .scores[$mn] | [(.overall // 0), (.correctness // 0), (.completeness // 0), (.quality // 0), (.clarity // 0)] | @tsv - ')" - local m_tier - m_tier=$(model_id_to_tier "$model_name") - [[ -z "$m_tier" ]] && m_tier="$model_name" - - # Normalize 1-10 to 1-5 for pattern tracker - local norm_corr norm_comp norm_qual norm_clar - norm_corr=$(awk "BEGIN{v=int($raw_corr/2+0.5); if(v<1)v=1; if(v>5)v=5; print v}") - norm_comp=$(awk "BEGIN{v=int($raw_comp/2+0.5); if(v<1)v=1; if(v>5)v=5; print v}") - norm_qual=$(awk "BEGIN{v=int($raw_qual/2+0.5); if(v<1)v=1; if(v>5)v=5; print v}") - norm_clar=$(awk "BEGIN{v=int($raw_clar/2+0.5); if(v<1)v=1; if(v>5)v=5; print v}") - - "$pt_helper" score \ - --model "$m_tier" \ - --task-type "$task_type" \ - --correctness "$norm_corr" \ - --completeness "$norm_comp" \ - --code-quality "$norm_qual" \ - --clarity "$norm_clar" \ - --source "cross-review-judge" \ - >/dev/null 2>&1 || true - - if [[ "$model_name" == "$winner" ]]; then - winner_overall_score="$m_ove" - elif [[ "$model_name" != "$winner" ]]; then - loser_args+=(--loser "$m_tier") - fi - done - - # Record A/B comparison in pattern tracker - if [[ -n "$winner_tier" && "${#loser_args[@]}" -gt 0 ]]; then - local winner_avg_norm - winner_avg_norm=$(awk "BEGIN{printf \"%.1f\", $winner_overall_score / 2}") - "$pt_helper" ab-compare \ - --winner "$winner_tier" \ - "${loser_args[@]}" \ - --task-type "$task_type" \ - --winner-score "$winner_avg_norm" \ - --models-compared "${#scored_models[@]}" \ - --source "cross-review-judge" \ - >/dev/null 2>&1 || true - fi - - print_success "Pattern tracker updated" - fi - - echo "" - echo "Judge output saved to: $judge_file" - echo "" + # Judge scoring pipeline (t1329) + # When --score is set, dispatch all outputs to a judge model for structured scoring. + if [[ "$score_flag" == "true" ]]; then + _cross_review_judge_score \ + "$prompt" "$models_str" "$output_dir" "$judge_model" "${model_names[@]}" fi return 0 @@ -1277,7 +1216,8 @@ cmd_patterns() { echo "No pattern data available." echo "" echo "Record patterns to populate this view:" - echo " /remember \"SUCCESS: code-review with sonnet — completed successfully\"" + echo " pattern-tracker-helper.sh record --outcome success --model sonnet --task-type code-review \\" + echo " --description \"Completed code review successfully\"" echo "" echo "The supervisor also records patterns automatically after each task." return 0 @@ -1341,8 +1281,8 @@ cmd_patterns() { fi echo "" - echo "Data source: cross-session memory (memory.db)" - echo "Record more: /remember \"SUCCESS: with \"" + echo "Data source: pattern-tracker-helper.sh (memory.db)" + echo "Record more: pattern-tracker-helper.sh record --outcome success --model ..." echo "" return 0 } @@ -1405,16 +1345,11 @@ cmd_help() { echo " --prompt 'Audit the architecture of this project' \\" echo " --models 'opus,pro' --timeout 900" echo " compare-models-helper.sh cross-review \\" - echo " --prompt 'Review this PR diff for bugs' \\" - echo " --models 'sonnet,opus' --score" + echo " --prompt 'Review this PR diff' --models 'sonnet,gemini-pro' \\" + echo " --score # auto-score via judge model (default: opus)" echo " compare-models-helper.sh cross-review \\" - echo " --prompt 'Review this PR diff for bugs' \\" - echo " --models 'sonnet,opus,pro' --score --judge opus --task-type review" - echo "" - echo "Cross-review options:" - echo " --score Auto-score outputs via judge model (default: opus)" - echo " --judge Judge model tier (default: opus)" - echo " --task-type Task type for scoring: code|review|analysis|text|general" + echo " --prompt 'Review this PR diff' --models 'sonnet,gemini-pro' \\" + echo " --score --judge sonnet # use sonnet as judge instead" echo "" echo "Data is embedded in this script. Last updated: 2025-02-08." echo "For live pricing, use /compare-models (with web fetch)." @@ -1956,14 +1891,37 @@ cmd_score() { return 1 fi - # Insert comparison record - local comp_id - comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$(echo "$task" | sed "s/'/''/g")', '$task_type', '$evaluator', '$winner'); SELECT last_insert_rowid();") + # Insert comparison record (escape all string values for SQL safety) + local comp_id safe_task safe_type safe_eval safe_winner + safe_task="${task//\'/\'\'}" + safe_type="${task_type//\'/\'\'}" + safe_eval="${evaluator//\'/\'\'}" + safe_winner="${winner//\'/\'\'}" + comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('${safe_task}', '${safe_type}', '${safe_eval}', '${safe_winner}'); SELECT last_insert_rowid();") - # Insert scores for each model + # Insert scores for each model (escape strings, validate numerics) for entry in "${model_entries[@]}"; do IFS='|' read -r m_id m_cor m_com m_qua m_cla m_adh m_ove m_lat m_tok m_str m_wea m_res <<<"$entry" - sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$m_id', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$(echo "$m_str" | sed "s/'/''/g")', '$(echo "$m_wea" | sed "s/'/''/g")', '$(echo "$m_res" | sed "s/'/''/g")');" + + # Validate all numeric fields — reject non-integer values to prevent SQL injection + for n in m_cor m_com m_qua m_cla m_adh m_ove m_lat m_tok; do + if ! [[ "${!n}" =~ ^[0-9]+$ ]]; then + print_error "Invalid numeric value for ${n}: ${!n}" + return 1 + fi + done + # Clamp score fields to valid 0-10 range + for s in m_cor m_com m_qua m_cla m_adh m_ove; do + if ((${!s} > 10)); then + printf -v "$s" "10" + fi + done + + local safe_id="${m_id//\'/\'\'}" + local safe_str="${m_str//\'/\'\'}" + local safe_wea="${m_wea//\'/\'\'}" + local safe_res="${m_res//\'/\'\'}" + sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '${safe_id}', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '${safe_str}', '${safe_wea}', '${safe_res}');" done print_success "Comparison #$comp_id recorded ($task_type: ${#model_entries[@]} models scored)" @@ -1990,9 +1948,9 @@ cmd_score() { fi echo "" - # Sync to unified pattern tracker backbone (t1094) — archived, graceful fallback + # Sync to unified pattern tracker backbone (t1094) # Scores are 1-10 here; normalize to 1-5 for pattern tracker compatibility. - local pt_helper="${SCRIPT_DIR}/archived/pattern-tracker-helper.sh" + local pt_helper="${SCRIPT_DIR}/pattern-tracker-helper.sh" if [[ -x "$pt_helper" ]]; then local winner_tier="" local loser_args=() @@ -2077,15 +2035,25 @@ cmd_results() { esac done + # Validate limit is numeric (used in SQL LIMIT clause) + if ! [[ "$limit" =~ ^[0-9]+$ ]]; then + print_error "Invalid --limit value: $limit (must be a positive integer)" + return 1 + fi + + # Escape string values for SQL safety (prevent injection via --model/--type args) + local safe_model_filter="${model_filter//\'/\'\'}" + local safe_type_filter="${type_filter//\'/\'\'}" + local where_clause="" - if [[ -n "$model_filter" ]]; then - where_clause="WHERE cs.model_id LIKE '%${model_filter}%'" + if [[ -n "$safe_model_filter" ]]; then + where_clause="WHERE cs.model_id LIKE '%${safe_model_filter}%'" fi - if [[ -n "$type_filter" ]]; then + if [[ -n "$safe_type_filter" ]]; then if [[ -n "$where_clause" ]]; then - where_clause="$where_clause AND c.task_type = '$type_filter'" + where_clause="$where_clause AND c.task_type = '${safe_type_filter}'" else - where_clause="WHERE c.task_type = '$type_filter'" + where_clause="WHERE c.task_type = '${safe_type_filter}'" fi fi diff --git a/.agents/scripts/matterbridge-helper.sh b/.agents/scripts/matterbridge-helper.sh index 43ef996ae8..c996d5db7b 100755 --- a/.agents/scripts/matterbridge-helper.sh +++ b/.agents/scripts/matterbridge-helper.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # matterbridge-helper.sh — Manage Matterbridge multi-platform chat bridge -# Usage: matterbridge-helper.sh [setup|start|stop|status|logs|validate|update|simplex-bridge] +# Usage: matterbridge-helper.sh [setup|start|stop|status|logs|validate|update] set -euo pipefail BINARY_PATH="/usr/local/bin/matterbridge" @@ -9,9 +9,6 @@ DATA_DIR="$HOME/.aidevops/.agent-workspace/matterbridge" PID_FILE="$DATA_DIR/matterbridge.pid" LOG_FILE="$DATA_DIR/matterbridge.log" LATEST_RELEASE_URL="https://api.github.com/repos/42wim/matterbridge/releases/latest" -SIMPLEX_BRIDGE_DIR="$DATA_DIR/simplex-bridge" -SIMPLEX_COMPOSE_FILE="${SIMPLEX_COMPOSE_FILE:-}" -AGENTS_DIR="${AGENTS_DIR:-$HOME/.aidevops/agents}" # ── helpers ────────────────────────────────────────────────────────────────── @@ -33,9 +30,20 @@ ensure_dirs() { } get_latest_version() { - local version - version=$(curl -fsSL "$LATEST_RELEASE_URL" 2>/dev/null | grep '"tag_name"' | head -1 | sed 's/.*"v\([^"]*\)".*/\1/') - echo "${version:-1.26.0}" + local version curl_output curl_status + curl_output=$(curl -fsSL "$LATEST_RELEASE_URL" 2>&1) && curl_status=0 || curl_status=$? + if [[ "$curl_status" -ne 0 ]]; then + log "WARNING: Could not fetch latest version (curl exit $curl_status) — using fallback" + echo "1.26.0" + return 0 + fi + version=$(echo "$curl_output" | grep '"tag_name"' | head -1 | sed 's/.*"v\([^"]*\)".*/\1/') + if [[ -z "$version" ]]; then + log "WARNING: Could not parse version from API response — using fallback" + echo "1.26.0" + return 0 + fi + echo "$version" return 0 } @@ -52,7 +60,14 @@ detect_os_arch() { case "$os" in linux) echo "linux-${arch}" ;; - darwin) echo "darwin-amd64" ;; + darwin) + # Matterbridge releases use darwin-arm64 and darwin-64bit (not darwin-amd64) + if [[ "$arch" == "arm64" ]]; then + echo "darwin-arm64" + else + echo "darwin-64bit" + fi + ;; *) echo "linux-${arch}" ;; esac return 0 @@ -152,11 +167,40 @@ cmd_validate() { return 1 fi - # Basic TOML syntax check via matterbridge dry-run - # matterbridge exits non-zero if config is invalid + # Check binary exists and is executable log "Validating config: $config_path" - if timeout 5 "$BINARY_PATH" -conf "$config_path" -version >/dev/null 2>&1; then - log "Binary OK" + if [ ! -x "$BINARY_PATH" ]; then + die "Binary not executable: $BINARY_PATH" + return 1 + fi + log "Binary OK: $BINARY_PATH" + + # Attempt to parse config (matterbridge will fail fast on invalid TOML) + # Use gtimeout on macOS if timeout is unavailable + local timeout_cmd="timeout" + if ! command -v timeout >/dev/null 2>&1; then + if command -v gtimeout >/dev/null 2>&1; then + timeout_cmd="gtimeout" + else + log "WARNING: timeout/gtimeout not found — skipping config parse check" + return 0 + fi + fi + + local parse_output parse_status + parse_output=$("$timeout_cmd" 5 "$BINARY_PATH" -conf "$config_path" 2>&1) && parse_status=$? || parse_status=$? + + if [[ "$parse_status" -eq 124 ]]; then + # timeout exit code 124 = process timed out (likely hung on credentials) + log "Config parse: process timed out (expected if credentials are not configured)" + elif [[ "$parse_status" -ne 0 ]]; then + # Non-zero exit — check if it's a config parse error + if echo "$parse_output" | grep -qi "toml\|parse\|syntax"; then + die "Config parse error: $parse_output" + return 1 + fi + # Other non-zero exits are expected (e.g., missing credentials) + log "Config parse: binary exited $parse_status (expected if credentials are not configured)" fi # Check for required sections @@ -216,18 +260,24 @@ cmd_stop() { local pid pid="$(cat "$PID_FILE")" log "Stopping (PID: $pid)..." - kill "$pid" 2>/dev/null || true + # kill may fail if process exited between is_running check and here (race condition) + local kill_err + kill_err=$(kill "$pid" 2>&1) || { + if [[ -n "$kill_err" ]]; then + log "WARNING: kill failed: $kill_err" + fi + } - local timeout=10 + local stop_timeout=10 local count=0 - while is_running && [ $count -lt $timeout ]; do + while is_running && [ $count -lt $stop_timeout ]; do sleep 1 count=$((count + 1)) done if is_running; then log "Force killing..." - kill -9 "$pid" 2>/dev/null || true + kill -9 "$pid" 2>&1 | while IFS= read -r line; do log "WARNING: $line"; done || true fi rm -f "$PID_FILE" @@ -251,12 +301,27 @@ cmd_status() { cmd_logs() { local follow=false local tail_lines=50 - local arg="${1:-}" - case "$arg" in - --follow | -f) follow=true ;; - --tail) tail_lines="${2:-50}" ;; - esac + while [[ $# -gt 0 ]]; do + case "$1" in + --follow | -f) + follow=true + shift + ;; + --tail) + if [[ -n "${2:-}" && "${2:-}" != -* ]]; then + tail_lines="$2" + shift 2 + else + tail_lines=50 + shift + fi + ;; + *) + shift + ;; + esac + done if [ ! -f "$LOG_FILE" ]; then log "No log file found: $LOG_FILE" @@ -273,8 +338,17 @@ cmd_logs() { } cmd_update() { + if [ ! -f "$BINARY_PATH" ]; then + die "Binary not found: $BINARY_PATH. Run: matterbridge-helper.sh setup" + return 1 + fi + # Ensure DATA_DIR exists before writing temp files (LOG_FILE lives under DATA_DIR) + ensure_dirs + local current_version new_version - current_version="$("$BINARY_PATH" -version 2>&1 | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -1 || echo "unknown")" + local version_err_file="${LOG_FILE}.version-err" + current_version="$("$BINARY_PATH" -version 2>"$version_err_file" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -1 || echo "unknown")" + rm -f "$version_err_file" new_version="$(get_latest_version)" if [ "$current_version" = "$new_version" ]; then @@ -308,247 +382,6 @@ cmd_update() { return 0 } -# ── simplex bridge (Docker Compose) ────────────────────────────────────────── - -_find_compose_file() { - # Priority: env var > simplex-bridge dir > agents configs dir - if [ -n "$SIMPLEX_COMPOSE_FILE" ] && [ -f "$SIMPLEX_COMPOSE_FILE" ]; then - echo "$SIMPLEX_COMPOSE_FILE" - return 0 - fi - if [ -f "$SIMPLEX_BRIDGE_DIR/docker-compose.yml" ]; then - echo "$SIMPLEX_BRIDGE_DIR/docker-compose.yml" - return 0 - fi - if [ -f "$AGENTS_DIR/configs/matterbridge-simplex-compose.yml" ]; then - echo "$AGENTS_DIR/configs/matterbridge-simplex-compose.yml" - return 0 - fi - # Check repo-local path (for development) - local script_dir - script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" - local repo_compose="${script_dir}/../configs/matterbridge-simplex-compose.yml" - if [ -f "$repo_compose" ]; then - echo "$repo_compose" - return 0 - fi - echo "" - return 1 -} - -_ensure_simplex_bridge_dir() { - mkdir -p "$SIMPLEX_BRIDGE_DIR" - mkdir -p "$SIMPLEX_BRIDGE_DIR/data/simplex" - return 0 -} - -cmd_simplex_bridge() { - local action="${1:-help}" - shift || true - - case "$action" in - up | start) - _simplex_bridge_up "$@" - ;; - down | stop) - _simplex_bridge_down "$@" - ;; - status | ps) - _simplex_bridge_status - ;; - logs) - _simplex_bridge_logs "$@" - ;; - init | setup) - _simplex_bridge_init "$@" - ;; - help | --help | -h) - _simplex_bridge_help - ;; - *) - log "Unknown simplex-bridge action: $action" - _simplex_bridge_help - return 1 - ;; - esac - - return 0 -} - -_simplex_bridge_up() { - local compose_file - compose_file="$(_find_compose_file)" || { - die "No compose file found. Run: matterbridge-helper.sh simplex-bridge init" - return 1 - } - - if ! command -v docker >/dev/null 2>&1; then - die "Docker not found. Install Docker or OrbStack first." - return 1 - fi - - # Check for matterbridge.toml in the compose file directory - local compose_dir - compose_dir="$(dirname "$compose_file")" - if [ ! -f "$compose_dir/matterbridge.toml" ] && [ ! -f "$SIMPLEX_BRIDGE_DIR/matterbridge.toml" ]; then - log "WARNING: No matterbridge.toml found. The bridge will not connect to any platforms." - log "Copy the template: cp configs/matterbridge-simplex.toml.example matterbridge.toml" - fi - - # Check for SimpleX database - if [ ! -d "$SIMPLEX_BRIDGE_DIR/data/simplex" ] || [ -z "$(ls -A "$SIMPLEX_BRIDGE_DIR/data/simplex" 2>/dev/null)" ]; then - log "WARNING: No SimpleX database found in $SIMPLEX_BRIDGE_DIR/data/simplex/" - log "Run simplex-chat CLI first to create a profile, then copy database files." - log "See: matterbridge-helper.sh simplex-bridge init" - fi - - log "Starting SimpleX bridge stack..." - docker compose -f "$compose_file" up --build -d "$@" - log "SimpleX bridge started. Check status: matterbridge-helper.sh simplex-bridge status" - return 0 -} - -_simplex_bridge_down() { - local compose_file - compose_file="$(_find_compose_file)" || { - die "No compose file found." - return 1 - } - - log "Stopping SimpleX bridge stack..." - docker compose -f "$compose_file" down "$@" - log "SimpleX bridge stopped." - return 0 -} - -_simplex_bridge_status() { - local compose_file - compose_file="$(_find_compose_file)" || { - log "No compose file found. SimpleX bridge not configured." - return 0 - } - - log "Compose file: $compose_file" - log "Data dir: $SIMPLEX_BRIDGE_DIR" - - if command -v docker >/dev/null 2>&1; then - docker compose -f "$compose_file" ps 2>/dev/null || log "No containers running" - else - log "Docker not available" - fi - - # Check for SimpleX database - if [ -d "$SIMPLEX_BRIDGE_DIR/data/simplex" ]; then - local db_count - db_count="$(find "$SIMPLEX_BRIDGE_DIR/data/simplex" -name 'simplex_v1_*' 2>/dev/null | wc -l | tr -d ' ')" - log "SimpleX database files: $db_count" - else - log "SimpleX database: not found" - fi - - return 0 -} - -_simplex_bridge_logs() { - local compose_file - compose_file="$(_find_compose_file)" || { - die "No compose file found." - return 1 - } - - docker compose -f "$compose_file" logs "$@" - return 0 -} - -_simplex_bridge_init() { - _ensure_simplex_bridge_dir - - # Copy compose file to working directory if not present - local compose_file - compose_file="$(_find_compose_file)" || true - - if [ -z "$compose_file" ] || [ ! -f "$SIMPLEX_BRIDGE_DIR/docker-compose.yml" ]; then - local template="" - # Find template from agents dir or repo - if [ -f "$AGENTS_DIR/configs/matterbridge-simplex-compose.yml" ]; then - template="$AGENTS_DIR/configs/matterbridge-simplex-compose.yml" - else - local script_dir - script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" - local repo_template="${script_dir}/../configs/matterbridge-simplex-compose.yml" - if [ -f "$repo_template" ]; then - template="$repo_template" - fi - fi - - if [ -n "$template" ]; then - cp "$template" "$SIMPLEX_BRIDGE_DIR/docker-compose.yml" - log "Copied compose template to $SIMPLEX_BRIDGE_DIR/docker-compose.yml" - else - die "No compose template found. Ensure aidevops is installed." - return 1 - fi - else - log "Compose file already exists: $SIMPLEX_BRIDGE_DIR/docker-compose.yml" - fi - - # Copy config template if not present - if [ ! -f "$SIMPLEX_BRIDGE_DIR/matterbridge.toml" ]; then - local config_template="" - if [ -f "$AGENTS_DIR/configs/matterbridge-simplex.toml.example" ]; then - config_template="$AGENTS_DIR/configs/matterbridge-simplex.toml.example" - else - local script_dir - script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" - local repo_config="${script_dir}/../configs/matterbridge-simplex.toml.example" - if [ -f "$repo_config" ]; then - config_template="$repo_config" - fi - fi - - if [ -n "$config_template" ]; then - cp "$config_template" "$SIMPLEX_BRIDGE_DIR/matterbridge.toml" - chmod 600 "$SIMPLEX_BRIDGE_DIR/matterbridge.toml" - log "Copied config template to $SIMPLEX_BRIDGE_DIR/matterbridge.toml" - else - log "WARNING: No config template found. Create matterbridge.toml manually." - fi - else - log "Config already exists: $SIMPLEX_BRIDGE_DIR/matterbridge.toml" - fi - - log "" - log "Next steps:" - log " 1. Run simplex-chat CLI to create a profile and join/create the chat to bridge" - log " 2. Copy database: cp ~/.simplex/simplex_v1_* $SIMPLEX_BRIDGE_DIR/data/simplex/" - log " 3. Set permissions: chmod -R 777 $SIMPLEX_BRIDGE_DIR/data/" - log " 4. Get chat ID: simplex-chat -e '/i #group_name'" - log " 5. Edit $SIMPLEX_BRIDGE_DIR/matterbridge.toml with platform credentials" - log " 6. Start: matterbridge-helper.sh simplex-bridge up" - return 0 -} - -_simplex_bridge_help() { - cat <<'HELP' -matterbridge-helper.sh simplex-bridge — Manage SimpleX-Matterbridge Docker stack - -Actions: - init Set up working directory with compose + config templates - up Start the 3-container stack (simplex, matterbridge, adapter) - down Stop and remove containers - status Show container status and database info - logs [--follow] Show container logs (pass -f for follow) - -Environment: - SIMPLEX_COMPOSE_FILE Override compose file path - SIMPLEX_CHAT_ID SimpleX chat ID to bridge (default: 1) - SIMPLEX_CHAT_TYPE Chat type: contact or group (default: group) - -Docs: .agents/services/communications/matterbridge.md -HELP - return 0 -} - cmd_help() { cat <<'HELP' matterbridge-helper.sh — Manage Matterbridge multi-platform chat bridge @@ -561,7 +394,6 @@ Commands: status Show running status logs [--follow] Show/follow log output update Update to latest release - simplex-bridge Manage SimpleX bridge Docker stack (init|up|down|status|logs) Config: ~/.config/aidevops/matterbridge.toml (override: MATTERBRIDGE_CONFIG) Docs: .agents/services/communications/matterbridge.md @@ -583,7 +415,6 @@ main() { status) cmd_status "$@" ;; logs) cmd_logs "$@" ;; update) cmd_update "$@" ;; - simplex-bridge) cmd_simplex_bridge "$@" ;; help | --help | -h) cmd_help ;; *) echo "Unknown command: $cmd" >&2 diff --git a/.agents/services/communications/matterbridge.md b/.agents/services/communications/matterbridge.md index e64feeb5af..f9c938725b 100644 --- a/.agents/services/communications/matterbridge.md +++ b/.agents/services/communications/matterbridge.md @@ -20,7 +20,7 @@ tools: - **Repo**: [github.com/42wim/matterbridge](https://github.com/42wim/matterbridge) (7.4K stars, Apache-2.0, Go) - **Version**: v1.26.0 (stable) -- **Script**: `matterbridge-helper.sh [setup|start|stop|status|logs|validate|simplex-bridge]` +- **Script**: `matterbridge-helper.sh [setup|start|stop|status|logs|validate]` - **Config**: `~/.config/aidevops/matterbridge.toml` (600 permissions) - **Data**: `~/.aidevops/.agent-workspace/matterbridge/` - **Requires**: Go 1.18+ (build) or pre-compiled binary @@ -62,7 +62,7 @@ matterbridge-helper.sh start --daemon ### 3rd Party via Matterbridge API -- **SimpleX**: [matterbridge-simplex](https://github.com/UnkwUsr/matterbridge-simplex) adapter (MIT, Node.js) — routes via SimpleX CLI WebSocket API +- **SimpleX**: [matterbridge-simplex](https://github.com/simplex-chat/matterbridge-simplex) adapter — routes via SimpleX CLI - **Delta Chat**: matterdelta - **Minecraft**: mattercraft, MatterBukkit @@ -76,8 +76,13 @@ curl -L https://github.com/42wim/matterbridge/releases/latest/download/matterbri -o /usr/local/bin/matterbridge chmod +x /usr/local/bin/matterbridge -# macOS -curl -L https://github.com/42wim/matterbridge/releases/latest/download/matterbridge-1.26.0-darwin-amd64 \ +# macOS (Intel) +curl -L https://github.com/42wim/matterbridge/releases/latest/download/matterbridge-1.26.0-darwin-64bit \ + -o /usr/local/bin/matterbridge +chmod +x /usr/local/bin/matterbridge + +# macOS (Apple Silicon) +curl -L https://github.com/42wim/matterbridge/releases/latest/download/matterbridge-1.26.0-darwin-arm64 \ -o /usr/local/bin/matterbridge chmod +x /usr/local/bin/matterbridge @@ -299,135 +304,34 @@ enable=true RemoteNickFormat="[{PROTOCOL}] <{NICK}> " ``` -### SimpleX via matterbridge-simplex Adapter - -SimpleX is not natively supported by Matterbridge. The [matterbridge-simplex](https://github.com/UnkwUsr/matterbridge-simplex) adapter (MIT, Node.js) bridges SimpleX Chat to Matterbridge's HTTP API, enabling SimpleX to connect to all 40+ platforms. - -#### Architecture - -```text -SimpleX CLI (WebSocket :5225) - | - | WebSocket JSON API (localhost) - | -matterbridge-simplex (Node.js adapter) - | - | HTTP REST API (localhost:4242) - | -Matterbridge (Go binary) - | - |--- Matrix rooms - |--- Telegram groups - |--- Discord channels - |--- Slack workspaces - |--- IRC channels - |--- 40+ other platforms -``` - -**Message flow (SimpleX -> other platforms)**: - -1. User sends message in SimpleX Chat (mobile/desktop/CLI) -2. SimpleX CLI receives via SMP protocol, emits `newChatItems` event on WebSocket -3. matterbridge-simplex adapter reads event, extracts text and sender -4. Adapter POSTs message to Matterbridge HTTP API (`/api/message`) -5. Matterbridge routes to all configured gateway destinations - -**Message flow (other platforms -> SimpleX)**: - -1. User sends message on Matrix/Telegram/Discord/etc. -2. Matterbridge receives via platform SDK, buffers in API endpoint -3. matterbridge-simplex adapter polls `/api/messages` (1s interval) -4. Adapter sends message to SimpleX CLI via `apiSendTextMessage` WebSocket command -5. SimpleX CLI delivers to the configured contact or group chat - -#### Features and Limitations - -| Feature | Status | Notes | -|---------|--------|-------| -| Text messages | Supported | Bidirectional | -| Image previews | Supported | SimpleX -> other platforms (preview only, not full file) | -| Full file transfer | Not yet | WIP in matterbridge-simplex | -| `/hide` prefix | Supported | Messages starting with `/hide` are not bridged — SimpleX-only | -| Contact chats | Supported | Bridge to a specific SimpleX contact | -| Group chats | Supported | Bridge to a specific SimpleX group | -| Multiple bridges | Supported | Run multiple adapter instances with different chat IDs | - -#### Quick Setup (Docker Compose) - -The recommended deployment uses Docker Compose with 3 containers. See `configs/matterbridge-simplex-compose.yml` for the full template. - -```bash -# 1. Prepare SimpleX database -# Run simplex-chat CLI first to create a profile and join/create the chat to bridge -simplex-chat -# Then move the database: -mkdir -p data/simplex -cp ~/.simplex/simplex_v1_* data/simplex/ -chmod -R 777 data/ # Required for Docker volume access - -# 2. Get the chat ID to bridge -simplex-chat -e '/_get chats 1 pcc=off' \ - | tail -n +2 \ - | jq '.[].chatInfo | (.groupInfo // .contact) | {name: .localDisplayName, type: (if .groupId then "group" else "contact" end), id: .groupId // .contactId}' - -# 3. Configure matterbridge.toml (copy from configs/matterbridge-simplex.toml.example) -cp configs/matterbridge-simplex.toml.example matterbridge.toml -# Edit: add your platform credentials and channel IDs - -# 4. Deploy -matterbridge-helper.sh simplex-bridge up - -# Or manually: -docker compose -f configs/matterbridge-simplex-compose.yml up --build -d -``` +### SimpleX via Adapter -#### Quick Setup (Manual) +SimpleX is not natively supported. Use [matterbridge-simplex](https://github.com/simplex-chat/matterbridge-simplex): ```bash -# 1. Install SimpleX CLI +# Install SimpleX CLI first curl -o- https://raw.githubusercontent.com/simplex-chat/simplex-chat/stable/install.sh | bash -# 2. Clone matterbridge-simplex and build -git clone https://github.com/UnkwUsr/matterbridge-simplex.git -cd matterbridge-simplex -git submodule update --init --recursive --depth 1 -( cd lib/simplex-chat-client-typescript/ && npm install && tsc ) +# Install matterbridge-simplex adapter +go install github.com/simplex-chat/matterbridge-simplex@latest -# 3. Start SimpleX CLI as WebSocket server -simplex-chat -p 5225 - -# 4. Start Matterbridge with API endpoint -matterbridge -conf matterbridge.toml - -# 5. Start the adapter -# Format: node main.js -node main.js 127.0.0.1:4242 gateway1 127.0.0.1:5225 1 group +# Run adapter (exposes Matterbridge API endpoint) +matterbridge-simplex --port 4242 --profile simplex-bridge ``` -#### Matterbridge Config for SimpleX - ```toml -# Matterbridge API endpoint — the adapter connects here +# Matterbridge config: use API bridge to connect to adapter [api] - [api.myapi] - BindAddress="127.0.0.1:4242" - Buffer=1000 + [api.simplex] + BindAddress="0.0.0.0:4243" + Token="your-api-token" -# Add your destination platform(s) -[matrix] - [matrix.home] - Server="https://matrix.example.com" - Login="bridgebot" - Password="YOUR_MATRIX_PASSWORD" - RemoteNickFormat="[SimpleX] <{NICK}> " - -# Gateway connecting SimpleX (via API) to Matrix [[gateway]] name="simplex-matrix" enable=true [[gateway.inout]] - account="api.myapi" + account="api.simplex" channel="api" [[gateway.inout]] @@ -435,66 +339,7 @@ enable=true channel="#bridged:example.com" ``` -See `configs/matterbridge-simplex.toml.example` for a complete template with Matrix, Telegram, and Discord examples. - -#### Obtaining SimpleX Chat IDs - -SimpleX uses separate ID spaces for contacts and groups. You need both the ID and type. - -```bash -# Get info for a specific chat -simplex-chat -e '/i #group_name' # Group -simplex-chat -e '/i @contact_name' # Contact - -# List all chats with IDs (requires jq) -simplex-chat -e '/_get chats 1 pcc=off' \ - | tail -n +2 \ - | jq '.[].chatInfo | (.groupInfo // .contact) | {name: .localDisplayName, type: (if .groupId then "group" else "contact" end), id: .groupId // .contactId}' -``` - -**Note**: The default chat ID in the Docker Compose template is `4`. Check your actual chat ID and update `docker-compose.yml` if it differs. - -## Privacy Gradient - -Matterbridge enables a **privacy gradient** — users choose their preferred privacy level while staying in the same conversation. - -```text -Maximum Privacy Maximum Convenience -| | -SimpleX Chat ──> Matrix (self-hosted) ──> Matrix (public) ──> Telegram/Discord -No identifiers Federated, E2E opt-in Federated Centralized -No metadata Server stores metadata Server stores Full metadata -No phone/email @user:server IDs @user:server IDs Phone/username -``` - -### How It Works - -1. **SimpleX users** get maximum privacy — no identifiers, no metadata, E2E encrypted -2. **Matrix users** get federation and E2E encryption (when enabled), with `@user:server` identifiers -3. **Telegram/Discord/Slack users** get convenience and existing ecosystem, with full platform metadata -4. All users see the same messages, prefixed with `[Platform] ` to identify origin -5. SimpleX users can send `/hide` messages that are not bridged — visible only on SimpleX - -### Security Implications - -**E2E encryption is broken at bridge boundaries.** When bridging: - -- Messages are decrypted by the SimpleX CLI process -- Passed in plaintext to the matterbridge-simplex adapter (localhost only) -- Re-encrypted (or sent plaintext) to destination platform by Matterbridge -- The bridge host has access to all message content in plaintext -- Metadata (sender, timestamp) is visible to all bridged platforms - -**Mitigations**: - -- Run the entire bridge stack on a trusted, hardened host -- Use `network_mode: host` in Docker (default) — all traffic stays on localhost -- Use NetBird/WireGuard to restrict access to the bridge host -- Store platform credentials in gopass: `aidevops secret set MATTERBRIDGE_MATRIX_TOKEN` -- Config file must have 600 permissions: `chmod 600 matterbridge.toml` -- Consider one-way bridges (`[[gateway.in]]`/`[[gateway.out]]`) to limit exposure - -See `tools/security/opsec.md` for full platform trust matrix and threat modeling. +**Note**: SimpleX E2E encryption is broken at the bridge boundary. Messages entering the bridge are decrypted and re-encrypted for the destination platform. See `tools/security/opsec.md` for implications. ## Running @@ -598,9 +443,22 @@ curl http://localhost:4242/api/messages \ ## Security Considerations -See [Privacy Gradient > Security Implications](#security-implications) for detailed analysis of E2E encryption at bridge boundaries. +**E2E encryption is broken at bridge boundaries.** When bridging: + +- Messages are decrypted by Matterbridge process +- Re-encrypted (or sent plaintext) to destination platform +- The bridge host has access to all message content in plaintext +- Metadata (sender, timestamp, channel) is visible to all bridged platforms + +**Mitigations**: -**Summary**: E2E encryption is broken at bridge boundaries. The bridge host sees all messages in plaintext. Run on a trusted host, use localhost-only networking, store credentials in gopass, and set config file permissions to 600. +- Run Matterbridge on a trusted, hardened host +- Use NetBird/WireGuard to restrict access to the bridge host +- Avoid bridging sensitive channels to unencrypted platforms (IRC, Slack, Discord) +- Store credentials in gopass: `aidevops secret set MATTERBRIDGE_DISCORD_TOKEN` +- Config file must have 600 permissions: `chmod 600 matterbridge.toml` + +See `tools/security/opsec.md` for full platform trust matrix and threat modeling. ## Related @@ -608,5 +466,3 @@ See [Privacy Gradient > Security Implications](#security-implications) for detai - `services/communications/simplex.md` — SimpleX install, bot API, self-hosted servers - `tools/security/opsec.md` — Platform trust matrix, E2E status, metadata warnings - `tools/ai-assistants/headless-dispatch.md` — Headless dispatch patterns -- `configs/matterbridge-simplex-compose.yml` — Docker Compose template for SimpleX bridge -- `configs/matterbridge-simplex.toml.example` — Config template for SimpleX-Matrix bridging