Skip to content

feat: v0.5.0 adversarial classifier (opt-in ML scorer)#34

Merged
vaaraio merged 3 commits into
mainfrom
feat/adversarial-classifier-0.5.0
Apr 23, 2026
Merged

feat: v0.5.0 adversarial classifier (opt-in ML scorer)#34
vaaraio merged 3 commits into
mainfrom
feat/adversarial-classifier-0.5.0

Conversation

@vaaraio
Copy link
Copy Markdown
Owner

@vaaraio vaaraio commented Apr 23, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional machine learning-based adversarial classifier for enhanced tool-call security detection.
    • Introduced pre-trained model bundle with included benchmark evaluations.
    • Optional ML support via vaara[ml] installation extra.
  • Documentation

    • Updated installation guide to highlight zero-dependency default and optional ML installation.
    • Added classifier usage documentation with held-out evaluation metrics and operational guidance.
  • Tests

    • Significantly expanded adversarial and benign test fixture datasets.
    • Added comprehensive benchmark results demonstrating classifier performance.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: de83b0bc-8a8b-4d63-8c31-4b1fbbdc6c22

📥 Commits

Reviewing files that changed from the base of the PR and between 451fd61 and 1513a64.

⛔ Files ignored due to path filters (200)
  • tests/adversarial/generated/CE-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/CE-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DA-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/DE-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/JB-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PI-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/SR-025.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-001.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-002.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-003.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-004.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-005.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-006.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-007.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-008.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-009.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-010.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-011.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-012.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-013.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-014.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-015.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-016.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-017.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-018.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-019.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-020.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-021.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-022.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-023.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-024.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-025.jsonl is excluded by !**/generated/**
📒 Files selected for processing (74)
  • CHANGELOG.md
  • README.md
  • examples/adversarial_classifier.py
  • pyproject.toml
  • scripts/classifier_vs_heuristic.py
  • src/vaara/__init__.py
  • src/vaara/adversarial_classifier.py
  • src/vaara/data/adversarial_classifier_v1.joblib
  • tests/adversarial/benign_generated/BN-001.jsonl
  • tests/adversarial/benign_generated/BN-002.jsonl
  • tests/adversarial/benign_generated/BN-003.jsonl
  • tests/adversarial/benign_generated/BN-004.jsonl
  • tests/adversarial/benign_generated/BN-005.jsonl
  • tests/adversarial/benign_generated/BN-006.jsonl
  • tests/adversarial/benign_generated/BN-007.jsonl
  • tests/adversarial/benign_generated/BN-008.jsonl
  • tests/adversarial/benign_generated/BN-009.jsonl
  • tests/adversarial/benign_generated/BN-010.jsonl
  • tests/adversarial/benign_generated/BN-011.jsonl
  • tests/adversarial/benign_generated/BN-012.jsonl
  • tests/adversarial/benign_generated/BN-013.jsonl
  • tests/adversarial/benign_generated/BN-014.jsonl
  • tests/adversarial/benign_generated/BN-015.jsonl
  • tests/adversarial/benign_generated/BN-016.jsonl
  • tests/adversarial/benign_generated/BN-017.jsonl
  • tests/adversarial/benign_generated/BN-018.jsonl
  • tests/adversarial/benign_generated/BN-019.jsonl
  • tests/adversarial/benign_generated/BN-020.jsonl
  • tests/adversarial/benign_generated/BN-021.jsonl
  • tests/adversarial/benign_generated/BN-022.jsonl
  • tests/adversarial/benign_generated/BN-023.jsonl
  • tests/adversarial/benign_generated/BN-024.jsonl
  • tests/adversarial/benign_generated/BN-025.jsonl
  • tests/adversarial/benign_generated/BN-026.jsonl
  • tests/adversarial/benign_generated/BN-027.jsonl
  • tests/adversarial/benign_generated/BN-028.jsonl
  • tests/adversarial/benign_generated/BN-029.jsonl
  • tests/adversarial/benign_generated/BN-030.jsonl
  • tests/adversarial/benign_generated/BN-031.jsonl
  • tests/adversarial/benign_generated/BN-032.jsonl
  • tests/adversarial/benign_generated/BN-033.jsonl
  • tests/adversarial/benign_generated/BN-034.jsonl
  • tests/adversarial/benign_generated/BN-035.jsonl
  • tests/adversarial/benign_generated/BN-036.jsonl
  • tests/adversarial/benign_generated/BN-037.jsonl
  • tests/adversarial/benign_generated/BN-038.jsonl
  • tests/adversarial/benign_generated/BN-039.jsonl
  • tests/adversarial/benign_generated/BN-040.jsonl
  • tests/adversarial/benign_generated/BN-041.jsonl
  • tests/adversarial/benign_generated/BN-042.jsonl
  • tests/adversarial/benign_generated/BN-043.jsonl
  • tests/adversarial/benign_generated/BN-044.jsonl
  • tests/adversarial/benign_generated/BN-045.jsonl
  • tests/adversarial/benign_generated/BN-046.jsonl
  • tests/adversarial/benign_generated/BN-047.jsonl
  • tests/adversarial/benign_generated/BN-048.jsonl
  • tests/adversarial/benign_generated/BN-049.jsonl
  • tests/adversarial/benign_generated/BN-050.jsonl
  • tests/adversarial/benign_generated/BT-001.jsonl
  • tests/adversarial/benign_generated/BT-002.jsonl
  • tests/adversarial/benign_generated/BT-003.jsonl
  • tests/adversarial/benign_generated/BT-004.jsonl
  • tests/adversarial/benign_generated/BT-005.jsonl
  • tests/adversarial/benign_generated/BT-006.jsonl
  • tests/adversarial/benign_generated/BT-007.jsonl
  • tests/adversarial/benign_generated/BT-008.jsonl
  • tests/adversarial/benign_generated/BT-009.jsonl
  • tests/adversarial/benign_generated/BT-010.jsonl
  • tests/adversarial/benign_generated/BT-011.jsonl
  • tests/adversarial/benign_generated/BT-012.jsonl
  • tests/adversarial/benign_generated/BT-013.jsonl
  • tests/adversarial/benign_generated/BT-014.jsonl
  • tests/adversarial/benign_generated/BT-015.jsonl
  • tests/adversarial/v0.5.0_benchmarks.json

📝 Walkthrough

Walkthrough

Version 0.5.0 adds an opt-in ML workflow: an XGBoost-based AdversarialClassifier (shipped as a joblib bundle), feature extraction and scoring APIs, a reproducible training/evaluation script with held-out benchmark results, an executable example, packaging/optional-deps updates, and many new benign-control JSONL fixtures plus a benchmark JSON.

Changes

Cohort / File(s) Summary
Core Classifier
src/vaara/adversarial_classifier.py
New AdversarialClassifier class: loads bundled joblib/XGBoost model, raises ImportError/FileNotFoundError appropriately, extracts fixed-length numpy features (serialized param blob, vocab/ngram indicators, regex detectors, URL-scheme flags, heuristic scalars), score() returns adversarial probability, is_malicious() applies threshold (bundle default or override).
Training & Eval Script
scripts/classifier_vs_heuristic.py
New held-out evaluation/training script: normalizes JSONL entries, builds by-seed train/test split, trains XGBoost, compares against heuristic Pipeline intercept, runs threshold sweep, computes per-category and overall metrics, and saves a versioned joblib bundle (data/adversarial_classifier_v1.joblib).
Example
examples/adversarial_classifier.py
New runnable example demonstrating instantiation, printing bundle version/threshold, scoring five hardcoded tool-call scenarios, and emitting BLOCK/ALLOW based on threshold.
Docs & Changelog
README.md, CHANGELOG.md
README updated with installation note (default zero-deps) and optional vaara[ml] extra listing xgboost, scikit-learn, joblib, numpy; documents classifier usage, held-out metrics, latency, operational caveats, and guidance to prefer decision="escalate" over decision="deny". CHANGELOG records v0.5.0 entry and benchmark figures.
Packaging / Metadata
pyproject.toml, [tool.setuptools.package-data]
Project version bumped to 0.5.0, new optional extra ml with ML deps and min versions, package-data configured to include data/*.joblib model artifacts.
Version Constant
src/vaara/__init__.py
Updated __version__ from 0.4.40.5.0.
Benchmarks / Results
tests/adversarial/v0.5.0_benchmarks.json
New benchmark JSON recording dataset composition, hyperparams, threshold sweep results (attack recall, benign FPR, balanced accuracy), per-category heuristic vs classifier deltas, and latency statistics plus live dogfood summary.
Test Fixtures — Benign Controls
tests/adversarial/benign_generated/BN-*.jsonl (BN-001 … BN-050) and tests/adversarial/benign_generated/BT-*.jsonl (BT-001 … BT-015)
Many new JSONL benign-control fixtures (9–10 cases each) covering diverse tools/agents (system ops, HTTP/API, GitHub/Jira, Kubernetes/Docker, email/messaging, monitoring, CI/CD, financial flows, etc.), largely marked expected: "ALLOW" with severity metadata.
Chore / Misc
examples/*, scripts/*
New example and script entrypoints added and made executable via if __name__ == "__main__":.

Sequence Diagram

sequenceDiagram
    participant Script as scripts/classifier_vs_heuristic.py
    participant Loader as JSONL Loader
    participant Splitter as Seed Splitter
    participant Featurer as Feature Extractor
    participant Trainer as XGBoost Trainer
    participant Heuristic as Heuristic Pipeline
    participant Evaluator as Evaluator/Metrics
    participant Bundle as Joblib Bundle

    Script->>Loader: read JSONL benign/adversarial entries
    Loader-->>Script: normalized entries (category/context/params)
    Script->>Splitter: deterministically split by seed ID (train/test)
    Splitter-->>Script: train/test partitions
    Script->>Featurer: convert entries -> feature vectors
    Featurer-->>Script: feature matrix, vocab, feature names
    Script->>Trainer: train XGBoost on train features/labels
    Trainer-->>Script: trained model
    Script->>Evaluator: predict probabilities on test set
    Evaluator->>Trainer: model.predict_proba(features)
    Trainer-->>Evaluator: probabilities
    Script->>Heuristic: generate heuristic binary decisions via Pipeline
    Heuristic-->>Evaluator: heuristic predictions
    Evaluator->>Evaluator: compute recall/FPR/balanced-accuracy by threshold
    Evaluator->>Bundle: save model, vocab, feature names, default threshold to joblib
    Bundle-->>Script: adversarial_classifier_v1.joblib artifact
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Poem

🐰 A fuzzy classifier hops into view,
With XGBoost features and thresholds so true,
Benign controls aplenty—fifty plus more!
Detecting cunning tool calls to the core,
This rabbit whispers "escalate, not deny" 🎯

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/adversarial-classifier-0.5.0

Comment @coderabbitai help to get the list of available commands and usage tips.

if force_malicious is not None:
e["expected"] = "ALLOW" if is_benign_file else "DENY"
out.append(e)
except json.JSONDecodeError:
Comment thread src/vaara/adversarial_classifier.py Fixed
@vaaraio vaaraio merged commit 2ac15cb into main Apr 23, 2026
8 checks passed
@vaaraio vaaraio deleted the feat/adversarial-classifier-0.5.0 branch April 23, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants