Skip to content

feat: BatchTransformerBase for fixed-size batching (#89)#249

Closed
Chris-Wolfgang wants to merge 2 commits into
mainfrom
feature/batch-transformer
Closed

feat: BatchTransformerBase for fixed-size batching (#89)#249
Chris-Wolfgang wants to merge 2 commits into
mainfrom
feature/batch-transformer

Conversation

@Chris-Wolfgang

Copy link
Copy Markdown
Owner

Adds BatchTransformerBase<TSource, TProgress> for fixed-size batch accumulation. Closes #89.

Stacked on #248 (base feature/async-disposable, the tip of the 0.14.0 base-class chain) so its regenerated baseline includes the full release surface.

What

BatchTransformerBase<TSource, TProgress> : TransformerBase<TSource, IReadOnlyList<TSource>, TProgress> accumulates source items into batches and yields each as IReadOnlyList<TSource>, flushing the trailing partial batch when the source completes — the bug everyone re-introduces when hand-rolling bulk ops.

var batched = batcher.TransformAsync(extractor.ExtractAsync(token), token); // IAsyncEnumerable<IReadOnlyList<OrderRecord>>
await loader.LoadAsync(batched, token);                                     // one bulk insert per batch
  • BatchSize — configurable, default 100, validated >= 1.
  • Sealed override of TransformWorkerAsync so the batching contract can't drift; derived classes only implement CreateProgressReport.
  • CurrentItemCount counts batches yielded (the downstream unit of work), not input items. Cancellation observed while enumerating.

Per the issue, this likely moves to a future Wolfgang.Etl.Transformers package; it lives in Abstractions until that exists.

Verified

MINOR (additive). PublicAPI.Shipped.txt regenerated. Full Release build clean across all TFMs (0 warnings); 246 tests pass (7 new: evenly-divisible, remainder-flush, single-partial, empty source, batch counting, BatchSize default + validation).

Part of: #154

BatchTransformerBase<TSource, TProgress> : TransformerBase<TSource,
IReadOnlyList<TSource>, TProgress> accumulates source items into fixed-size
batches and yields each as IReadOnlyList<TSource>, flushing the trailing
partial batch when the source completes (the classic bulk-op bug, solved once).

- Configurable BatchSize (default 100, validated >= 1).
- Sealed override of TransformWorkerAsync so the batching contract can't drift;
  derived classes only supply CreateProgressReport.
- CurrentItemCount counts batches yielded (the downstream unit of work), not
  input items. Cancellation observed while enumerating the source.

MINOR (additive public API). PublicAPI.Shipped.txt regenerated. Verified: full
Release build clean across all TFMs (0 warnings); 246 tests pass (7 new:
even/uneven/empty/partial batching, batch counting, BatchSize validation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Base automatically changed from feature/async-disposable to main June 24, 2026 01:51
@Chris-Wolfgang

Copy link
Copy Markdown
Owner Author

Closing — superseded by ChunkTransformer<T> in ETL-Transformers, which already does fixed-size batching (yields chunks, flushes the partial final chunk, validates size >= 1, aimed at bulk loading). BatchTransformerBase is concrete transformer behavior and belongs in the transformers library, not in Abstractions (which is contracts + scaffolding only). The two minor deltas (IReadOnlyList return, progress support) are being raised as small PRs against ETL-Transformers instead.

@Chris-Wolfgang Chris-Wolfgang deleted the feature/batch-transformer branch June 25, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add BatchTransformerBase for fixed-size batch accumulation

1 participant