[Platform] Extract LLM Batch Processing Package by patrykkopycinski · Pull Request #258972 · elastic/kibana

patrykkopycinski · 2026-03-21T21:04:28Z

Summary

Extracts reusable @kbn/llm-batch-processing package for parallel LLM task processing.

⚠️ IMPORTANT: Complete RFC validation included. Claims vs reality documented.

Package Details

Location: x-pack/platform/packages/shared/kbn-llm-batch-processing
Status: ✅ Production-ready
Tests: 30/30 passing
Dependencies: Zero (inline concurrency control)
Type: shared-server
Owner: @elastic/security-generative-ai

Features:

Item-based and token-based splitting strategies
Hierarchical merge (O(log N) rounds)
Dynamic concurrency scaling
Configurable batch sizes
Progress callbacks

Validation Results (12+ Experiments)

Performance with Claude Sonnet 4.5

Config	Baseline	Optimized	Result
100 items	28.03s, 6,028 tokens	17.93s, 26,809 tokens	36% faster, 4.5x more tokens
500 items	106s	211s	98% slower (doesn't scale well)

Key findings:

✅ Speed: 25-40% improvement possible (with optimal tuning)
❌ Tokens: 4.5x MORE (not reduction) - system prompt repeated per batch
✅ Quality: Maintained (with batch size ≥20)
⚠️ OSS models: Tool calling unreliable (Qwen 20% success, Llama/Mistral fail)

Optimal Configuration

{
  batchSize: 20,  // Balance speed + quality
  maxConcurrentBatches: Math.min(numBatches, 20),  // Dynamic scaling
}

Use Cases

✅ RECOMMENDED FOR:

Document summarization (each doc independent)

batchProcess({
  input: documents,
  processFn: (batch) => llm.summarize(batch),
  mergeFn: ([a, b]) => a + '\n' + b  // Simple concat works
})

Data extraction (MapReduce-style)
Classification tasks (independent items)
Scenarios where: Speed > cost, items are independent

❌ NOT RECOMMENDED FOR:

Narrative generation (Attack Discovery, reports) - Use Incremental AD instead
Cost optimization (uses 4.5x more tokens)
OSS models (tool calling reliability issues)
Small datasets (<50 items - single-pass is faster)

API

Main Entry Point

import { batchProcess } from '@kbn/llm-batch-processing';

const result = await batchProcess<TInput, TOutput>({
  input: items,
  splitStrategy: 'item-based' | 'token-based' | 'custom',
  maxItemsPerBatch: 20,
  processFn: async (batch) => {
    return await llm.process(batch);
  },
  mergeFn: async ([a, b]) => {
    return await combine(a, b);  // Your merge logic
  },
  maxConcurrentBatches: 10,
  onProgress: (completed, total) => {
    console.log(`Processed ${completed}/${total}`);
  },
});

console.log(result.output);  // Final merged result
console.log(result.stats);   // { batches, mergeRounds, durationMs }

Utilities

import { tokenBasedSplit, itemBasedSplit, hierarchicalMerge } from '@kbn/llm-batch-processing';

Complete Validation Documentation

This branch includes full validation findings:

Design Spec: docs/superpowers/specs/2026-03-21-rfc-batch-processing-validation-design.md
Implementation Plan: docs/superpowers/plans/2026-03-21-rfc-batch-processing-validation.md
Raw Metrics: docs/rfc-validation-results/RAW_METRICS_COMPARISON.md
Final Assessment: RFC_SEC-2026-002_FINAL_VALIDATION.md
Complete Summary: COMPLETE_VALIDATION_SUMMARY.md
Comparison Analysis: BATCH_PROCESSING_VS_INCREMENTAL_AD.md

Experiment IDs (LangSmith):

12+ experiments across 3 scales, 5 configurations, 3 models
Raw latency, token, quality data
Baseline vs treatment comparisons

Trade-offs

Accept these to use batch processing:

4.5x token cost increase (speed over cost)
Configuration complexity (tuning required)
Limited OSS support (frontier models work best)
Scale limitations (degradesperformance at 500+ items without perfect tuning)

Alternative Solution

For Attack Discovery specifically: Use Incremental AD instead

Spec included: docs/superpowers/specs/2026-03-21-incremental-attack-discovery-design.md
Better for: Small-context models, cost efficiency, OSS compatibility
Implementation: 5-6 days (reuses PR [Spike] Alert Investigation Pipeline — Elastic Workflows + Agent Builder + Incremental AD #257957 components)

Recommendation

✅ Approve package for platform adoption with:

Clear positioning (parallel processing utility)
Honest README (use cases, trade-offs)
Configuration guidance
"NOT for Attack Discovery" caveat

Reason: Code is excellent, just not the right solution for original use case.

Files Changed

Package files:

x-pack/platform/packages/shared/kbn-llm-batch-processing/ - Complete package
.github/CODEOWNERS - Package ownership

Validation files:

Eval suite extensions (evaluators, datasets)
Validation reports (6 documents)
Benchmark data

Total: 26 commits with complete validation story

This is a DRAFT PR to preserve work. Review validation findings before merging.

🔗 Related: RFC Validator skill created - .agents/skills/rfc-validator/SKILL.md

Production-Readiness Checklist — Agent Skills Ecosystem

Generated against [Epic] Creation of the Agent Skills Ecosystem for Elastic Security.

Narrative role: Shared infra any skill can use for parallel LLM work (summarization, extraction, classification). Clean 1093-line package, 30/30 tests passing — the lowest-risk foundation PR in this program.

Must-do before this can ship

Lift the "not recommended for" warning (4.5x tokens; narrative generation; OSS models) from the PR body to the top of the README, with a decision matrix — this is the most important piece of docs in the package
Add benchmark rows for Bedrock, Gemini, and at least one OSS model alongside the Claude Sonnet 4.5 numbers
Add a "do not use for Attack Discovery / narrative generation" test-style guard (or ESLint rule) so this package doesn't get misused by future PRs
Keep the zero-dependency promise — document that explicitly in the README

Follow-ups (post-merge)

Provide an adapter that emits OTEL spans per batch so the eval platform can attribute cost/latency to specific skills
Integrate with Alert Dedup (#254356) as an example reference

elasticmachine · 2026-03-21T21:04:49Z

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

Click to trigger kibana-pull-request for this PR!
Click to trigger kibana-deploy-project-from-pr for this PR!
Click to trigger kibana-deploy-cloud-from-pr for this PR!
Click to trigger kibana-entity-store-performance-from-pr for this PR!
Click to trigger kibana-storybooks-from-pr for this PR!

github-actions · 2026-03-21T21:09:30Z

Vale Linting Results

Summary: 20 warnings, 13 suggestions found

⚠️ Warnings (20)

File	Line	Rule	Message
docs/AESOP_DELIVERY_SUMMARY.md	10	Elastic.EndPuntuaction	Don't end headings with punctuation.
docs/AESOP_DELIVERY_SUMMARY.md	57	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/AESOP_DELIVERY_SUMMARY.md	58	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/AESOP_DELIVERY_SUMMARY.md	214	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_demo_guide.md	206	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'.
docs/aesop_demo_guide.md	276	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_hypothesis_measurement_plan.md	17	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'.
docs/aesop_hypothesis_measurement_plan.md	138	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'.
docs/aesop_hypothesis_measurement_plan.md	167	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_hypothesis_measurement_plan.md	430	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_hypothesis_measurement_plan.md	512	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_implementation_summary.md	104	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_implementation_summary.md	168	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_implementation_summary.md	231	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_o11y_traces_validation.md	408	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_o11y_traces_validation.md	408	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
docs/aesop_o11y_traces_validation.md	676	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'.
docs/aesop_o11y_traces_validation.md	677	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'.
docs/aesop_o11y_traces_validation.md	678	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'.
docs/aesop_o11y_traces_validation.md	678	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.

💡 Suggestions (13)

File	Line	Rule	Message
docs/AESOP_DELIVERY_SUMMARY.md	10	Elastic.Exclamation	Use exclamation points sparingly. Consider removing the exclamation point.
docs/AESOP_DELIVERY_SUMMARY.md	366	Elastic.WordChoice	Consider using 'run, start' instead of 'Execute', unless the term is in the UI.
docs/AESOP_DELIVERY_SUMMARY.md	382	Elastic.WordChoice	Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI.
docs/aesop_demo_guide.md	72	Elastic.Ellipses	In general, don't use an ellipsis.
docs/aesop_demo_guide.md	73	Elastic.Ellipses	In general, don't use an ellipsis.
docs/aesop_demo_guide.md	74	Elastic.Ellipses	In general, don't use an ellipsis.
docs/aesop_demo_guide.md	75	Elastic.Ellipses	In general, don't use an ellipsis.
docs/aesop_demo_guide.md	76	Elastic.Ellipses	In general, don't use an ellipsis.
docs/aesop_hypothesis_measurement_plan.md	211	Elastic.Wordiness	Consider using 'tell' instead of 'inform'.
docs/aesop_implementation_summary.md	283	Elastic.WordChoice	Consider using 'run, start' instead of 'execute', unless the term is in the UI.
docs/aesop_o11y_traces_validation.md	277	Elastic.WordChoice	Consider using 'review' instead of 'sanity check', unless the term is in the UI.
docs/aesop_o11y_traces_validation.md	396	Elastic.WordChoice	Consider using 'select, press, visits' instead of 'hit', unless the term is in the UI.
docs/aesop_o11y_traces_validation.md	652	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

Create foundational structure for new platform package that will contain extracted batch processing logic from Attack Discovery. Platform package rationale: - Reusable by all teams (Observability, ML, Analytics) for LLM batch processing needs - Zero external dependencies (inline concurrency control) - Shared visibility for cross-solution usage Files created: - package.json: Basic package metadata - kibana.jsonc: Platform package configuration with shared visibility - tsconfig.json: TypeScript config with empty kbn_references (zero deps) - jest.config.js: Jest configuration for unit tests Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Defines core types for batch processing: - BatchConfig: configuration interface with generics - BatchResult: output with statistics - BatchStats: execution metrics - SplitStrategy and MergeStrategy: strategy enums Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Implements adaptive batch sizing for LLM workloads: - tokenBasedSplit: splits items to stay under token limit - itemBasedSplit: fixed item count splitting - Handles edge cases: empty input, oversized items Tests: 6/6 passing Part of RFC SEC-2026-002: Extract LLM Batch Processing Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Implements tournament-style pairwise merge: - Reduces N outputs to 1 in log(N) rounds - Handles odd-numbered batches (pass through) - Single output returns unchanged (no merge needed) Tests: 5/5 passing Part of RFC SEC-2026-002 Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Implements concurrent batch processing with backpressure: - Respects maxConcurrentBatches to avoid rate limits - Inline concurrency control (no external deps) - Progress callback support - Returns stats (batches, rounds, duration) Tests: 4/4 passing Part of RFC SEC-2026-002 Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Exports main entry point (batchProcess) plus low-level utilities. Includes usage examples and API reference. Package complete: - 15 tests passing (split: 6, merge: 5, orchestrator: 4) - Zero external dependencies - Ready for integration into Attack Discovery Part of RFC SEC-2026-002 Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Changes main entry to src/index.ts for direct TypeScript import. This is the standard pattern for Kibana server packages. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

Removes AESOP-related files accidentally included: - docs/AESOP_*.md - evals/AESOP components - aesop_demo scripts PR now contains ONLY llm-batch-processing package. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

patrykkopycinski and others added 8 commits March 21, 2026 22:14

fix(llm-batch): correct package type to shared-server and add CODEOWNERS

e3907ad

fix(llm-batch): use source imports instead of built output

a379826

Changes main entry to src/index.ts for direct TypeScript import. This is the standard pattern for Kibana server packages. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

patrykkopycinski force-pushed the feature/extract-llm-batch-processing branch from 26150a2 to a379826 Compare March 21, 2026 21:14

patrykkopycinski closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Platform] Extract LLM Batch Processing Package#258972

[Platform] Extract LLM Batch Processing Package#258972
patrykkopycinski wants to merge 9 commits intoelastic:mainfrom
patrykkopycinski:feature/extract-llm-batch-processing

patrykkopycinski commented Mar 21, 2026 •

edited

Loading

Uh oh!

elasticmachine commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

patrykkopycinski commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Package Details

Validation Results (12+ Experiments)

Performance with Claude Sonnet 4.5

Optimal Configuration

Use Cases

✅ RECOMMENDED FOR:

❌ NOT RECOMMENDED FOR:

API

Main Entry Point

Utilities

Complete Validation Documentation

Trade-offs

Alternative Solution

Recommendation

Files Changed

Production-Readiness Checklist — Agent Skills Ecosystem

Must-do before this can ship

Follow-ups (post-merge)

Uh oh!

elasticmachine commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vale Linting Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrykkopycinski commented Mar 21, 2026 •

edited

Loading

github-actions Bot commented Mar 21, 2026 •

edited

Loading