Adds continuous batching #850

NathanHB · 2025-07-03T12:03:50Z

No description provided.

… into add-fast-generate

HuggingFaceDocBuilderDev · 2025-07-03T12:06:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull Request Overview

This PR introduces continuous batching support for Transformer-based models, enabling split-wise streaming generation.

Adds continuous_batching flag throughout configuration, model initialization, and generation functions.
Implements a new _continuous_greedy_until path and refactors _generate to dispatch based on the flag.
Updates GenerationParameters and example configs to include num_blocks and block_size, and adjusts tests accordingly.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/models/endpoints/test_tgi_model.py	Inserts `block_size` and `num_blocks` into generation parameters
tests/models/endpoints/test_endpoint_model.py	Inserts `num_blocks` and `block_size` into generation parameters
src/lighteval/models/transformers/transformers_model.py	Propagates `continuous_batching` through init, from_model, and generate paths
src/lighteval/models/model_input.py	Extends `GenerationParameters` with `num_blocks` and `block_size`
examples/model_configs/transformers_model.yaml	Adds `continuous_batching` and example `num_blocks`/`block_size`

Comments suppressed due to low confidence (2)

src/lighteval/models/model_input.py:28

[nitpick] New fields num_blocks and block_size in GenerationParameters lack descriptions in the class docstring. Consider documenting their purpose and effects.

    num_blocks: NonNegativeInt | None = None  # transformers

src/lighteval/models/transformers/transformers_model.py:114

There are no existing tests covering the new continuous_batching logic path. Consider adding unit tests to verify both True and False behaviors.

        continuous_batching (bool):

src/lighteval/models/transformers/transformers_model.py

tests/models/endpoints/test_tgi_model.py

tests/models/endpoints/test_endpoint_model.py

clefourrier · 2025-07-16T10:53:36Z

src/lighteval/models/transformers/transformers_model.py

+        else:
+            return self._padded_greedy_until(docs)
+
+    def _generate_fast(


is generate fast for continuous batching only? if yes -> call it generate_continuous then, since the other is generate_padded and not generate_slow (for homogeneity)

clefourrier · 2025-07-16T10:55:31Z

src/lighteval/models/transformers/transformers_model.py

        return batch_size

-    def greedy_until(
+    def _continuous_greedy_until(


Is there anyway to factorize more between continuous and padded greedy until? (other wise, there's a risk we end up having different input management for example, like we had in the past across generation models)

src/lighteval/models/transformers/transformers_model.py

clefourrier · 2025-08-01T11:54:18Z

Content from original PR:
Add necessary changes to call generate with CB
Linked PR: huggingface/transformers#38085
This works:

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.pipeline import Pipeline, PipelineParameters, ParallelismManager
from lighteval.models.endpoints.inference_providers_model import (
    InferenceProvidersModelConfig,
)
from lighteval.models.transformers.transformers_model import TransformersModel
import torch
from transformers import AutoModelForCausalLM, GenerationConfig

MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
PROVIDER = "hf-inference"
BENCHMARKS = "lighteval|gsm8k|0|0"

evaluation_tracker = EvaluationTracker(output_dir="./results")
pipeline_params = PipelineParameters(
    use_chat_template=True, launcher_type=ParallelismManager.NONE, max_samples=None
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3b-Instruct", attn_implementation="sdpa_paged", torch_dtype=torch.bfloat16, device_map="auto"
)

# Configure generation parameters
generation_config = GenerationConfig(
    max_new_tokens=10,
    eos_token_id=model.config.eos_token_id,
    pad_token_id=model.config.pad_token_id,
    num_blocks=2048,
    block_size=256,
)
model.generation_config = generation_config
model = TransformersModel.from_model(model)
pipeline = Pipeline(
    model=model,
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    tasks=BENCHMARKS,
)

pipeline.evaluate()
results = pipeline.get_results()["results"]
print(results)

ArthurZucker · 2025-08-03T16:56:59Z

Does not work on my side 😢 I might have done something wrong tho!

clefourrier · 2025-08-04T05:49:41Z

I'll debug this week

Add necessary changes to call generate with CB Linked PR: huggingface/transformers#38085 This works: ```python from lighteval.logging.evaluation_tracker import EvaluationTracker from lighteval.pipeline import Pipeline, PipelineParameters, ParallelismManager from lighteval.models.endpoints.inference_providers_model import ( InferenceProvidersModelConfig, ) from lighteval.models.transformers.transformers_model import TransformersModel import torch from transformers import AutoModelForCausalLM, GenerationConfig MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct" PROVIDER = "hf-inference" BENCHMARKS = "lighteval|gsm8k|0|0" evaluation_tracker = EvaluationTracker(output_dir="./results") pipeline_params = PipelineParameters( use_chat_template=True, launcher_type=ParallelismManager.NONE, max_samples=None ) model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-3b-Instruct", attn_implementation="sdpa_paged", torch_dtype=torch.bfloat16, device_map="auto" ) # Configure generation parameters generation_config = GenerationConfig( max_new_tokens=10, eos_token_id=model.config.eos_token_id, pad_token_id=model.config.pad_token_id, num_blocks=2048, block_size=256, ) model.generation_config = generation_config model = TransformersModel.from_model(model) pipeline = Pipeline( model=model, pipeline_parameters=pipeline_params, evaluation_tracker=evaluation_tracker, tasks=BENCHMARKS, ) pipeline.evaluate() results = pipeline.get_results()["results"] print(results) ``` --------- Co-authored-by: Arthur Zucker <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]>

ArthurZucker and others added 9 commits May 9, 2025 13:16

update for CB

41838c0

update

f7a3c2f

push

c9b3467

Merge branch 'main' into add-fast-generate

796ef5a

c'est une honte, 0.2.... ruff....

a7e2751

Merge branch 'add-fast-generate' of github.com:ArthurZucker/lighteval…

a1c4c00

… into add-fast-generate

Merge branch 'main' into add-fast-generate

2b162f7

Merge branch 'main' into add-fast-generate

101083e

merge main

0f772b1

NathanHB added 6 commits July 3, 2025 13:42

fix model

df98d9b

fix model

1da56bd

fix tests

fe6f24c

fix slow tests

96466e4

fix slow tests

8344961

reset vllm model file config

7453c6f

NathanHB requested a review from Copilot July 8, 2025 11:59

Copilot AI reviewed Jul 8, 2025

View reviewed changes

src/lighteval/models/transformers/transformers_model.py Show resolved Hide resolved

tests/models/endpoints/test_tgi_model.py Show resolved Hide resolved

tests/models/endpoints/test_endpoint_model.py Show resolved Hide resolved

clefourrier reviewed Jul 16, 2025

View reviewed changes

Merge branch 'main' into nathan-add-continious-batching

84237ad

clefourrier reviewed Aug 1, 2025

View reviewed changes

src/lighteval/models/transformers/transformers_model.py Outdated Show resolved Hide resolved

clefourrier reviewed Aug 1, 2025

View reviewed changes

src/lighteval/models/transformers/transformers_model.py Outdated Show resolved Hide resolved

Apply suggestions from code review

c179876

clefourrier approved these changes Aug 1, 2025

View reviewed changes

clefourrier mentioned this pull request Aug 1, 2025

update for CB #714

Closed

clefourrier merged commit 99bfd9f into main Aug 1, 2025
5 checks passed

NathanHB added the feature label Sep 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds continuous batching #850

Adds continuous batching #850

Uh oh!

NathanHB commented Jul 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier Jul 16, 2025

Uh oh!

clefourrier Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

clefourrier commented Aug 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

ArthurZucker commented Aug 3, 2025

Uh oh!

clefourrier commented Aug 4, 2025

Uh oh!

Uh oh!

Adds continuous batching #850

Adds continuous batching #850

Uh oh!

Conversation

NathanHB commented Jul 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

clefourrier Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

clefourrier commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ArthurZucker commented Aug 3, 2025

Uh oh!

clefourrier commented Aug 4, 2025

Uh oh!

Uh oh!

clefourrier commented Aug 1, 2025 •

edited

Loading