[Misc][Benchmark] Add support for CustomDataset by ekagra-ranjan · Pull Request #18511 · vllm-project/vllm

ekagra-ranjan · 2025-05-21T23:00:21Z

Currently, online serving benchmark only loads some specified dataset. This PR adds support for CustomDataset which can load arbitrary datasets for benchmarking. It accepts dataset in jsonl format and can be extended to other format since it converts the input data to into a standardized format of list of dictionary in load_data() which then is used in sample().
Example is given in the comment of the function.

Usage

MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
ONLINE_CUSTOM_DATASET="data.jsonl"

time python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detailed \
  --backend vllm \
  --model "${MODEL_NAME}" \
  --endpoint /v1/completions \
  --dataset-name custom \
  --dataset-path ${ONLINE_CUSTOM_DATASET} \
  --custom-skip-chat-template \
  --num-prompts 80 \
  --max-concurrency 1 \
  --temperature=0.3 \
  --top-p=0.75 \
  --result-dir "./log/"

example of data.jsonl

{"prompt": "What is the capital of India?"}
{"prompt": "What is the capital of Iran?"}
{"prompt": "What is the capital of China?"}

cc: @ywang96

github-actions · 2025-05-21T23:00:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

…ustom-dataset

ywang96

Overall LGTM and I left a comment! Can you also update the benchmark readme to include this dataset? Thanks!

benchmarks/benchmark_dataset.py

ekagra-ranjan · 2025-05-28T20:09:26Z

@ywang96 - all done! Pls have another look.

ywang96

🚢 @ekagra-ranjan Thank you for the contribution!

ekagra-ranjan · 2025-05-28T22:41:10Z

There is 1 failure in V1 test which is blocking merge


=========================== short test summary info ============================
--
  | [2025-05-28T21:23:02Z] FAILED v1/engine/test_async_llm.py::test_abort[engine_args0-Hello my name is Robert and-RequestOutputKind.DELTA] - assert not True
  | [2025-05-28T21:23:02Z]  +  where True = has_unfinished_requests()
  | [2025-05-28T21:23:02Z]  +    where has_unfinished_requests = <vllm.v1.engine.output_processor.OutputProcessor object at 0x7ff42ed8d4f0>.has_unfinished_requests
  | [2025-05-28T21:23:02Z]  +      where <vllm.v1.engine.output_processor.OutputProcessor object at 0x7ff42ed8d4f0> = <vllm.v1.engine.async_llm.AsyncLLM object at 0x7ff433c1ce60>.output_processor
  | [2025-05-28T21:23:02Z] ============ 1 failed, 50 passed, 21 warnings in 760.68s (0:12:40) =============
  | [2025-05-28T21:23:05Z] 🚨 Error: The command exited with status 1

I dont think its caused by this PR. Could be related to this. @ywang96 - what are your thoughts?

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 · 2025-05-30T17:01:52Z

@ekagra-ranjan I've merged this branch from main so let's see if it helps!

Signed-off-by: amit <amit.man@gmail.com>

ekagra-ranjan added 3 commits May 21, 2025 22:50

add custom dataset

ba0be9d

lint

b429fa0

remove todo

2ae6cbd

Merge branch 'main' of https://github.com/vllm-project/vllm into er-c…

f306c61

…ustom-dataset

simon-mo requested a review from ywang96 May 27, 2025 19:03

ekagra-ranjan added 4 commits May 27, 2025 21:19

add to datasets.py

3e9e1e6

lint

9e4a621

add pandas

bd60081

Merge branch 'main' of https://github.com/vllm-project/vllm into er-c…

7445161

…ustom-dataset

ekagra-ranjan mentioned this pull request May 28, 2025

[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets #18847

Merged

ywang96 approved these changes May 28, 2025

View reviewed changes

benchmarks/benchmark_dataset.py Show resolved Hide resolved

fix bug save detailed

3e7c7da

ekagra-ranjan mentioned this pull request May 28, 2025

[Misc][Benchmark][Bugfix] Fix save detailed #18509

Closed

ekagra-ranjan added 4 commits May 28, 2025 19:36

add more check

4b0b40b

add doc

005523d

lint

40dfca3

fix save detail in serve.py

db5f71a

ekagra-ranjan requested a review from ywang96 May 28, 2025 20:08

ywang96 approved these changes May 28, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label May 28, 2025

ywang96 enabled auto-merge (squash) May 28, 2025 20:19

ekagra-ranjan requested a review from ywang96 May 29, 2025 15:36

ywang96 added 2 commits May 30, 2025 09:58

Merge remote-tracking branch 'upstream/main' into er-custom-dataset

1c2be9e

format

11d52d6

Signed-off-by: Roger Wang <ywang@roblox.com>

ywang96 merged commit bbfa0c6 into vllm-project:main May 31, 2025
62 of 63 checks passed

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[Misc][Benchmark] Add support for CustomDataset (vllm-project#18511)

c28ca26

Signed-off-by: amit <amit.man@gmail.com>

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[Misc][Benchmark] Add support for CustomDataset (vllm-project#18511)

8882440

Signed-off-by: amit <amit.man@gmail.com>

ZhangShuaiyi mentioned this pull request Jun 3, 2025

[Bugfix]: import pandas for benchmarks #19079

Closed

sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025

[Misc][Benchmark] Add support for CustomDataset (vllm-project#18511)

89e34a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc][Benchmark] Add support for CustomDataset#18511

[Misc][Benchmark] Add support for CustomDataset#18511
ywang96 merged 15 commits intovllm-project:mainfrom
ekagra-ranjan:er-custom-dataset

ekagra-ranjan commented May 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

ywang96 left a comment

Uh oh!

Uh oh!

ekagra-ranjan commented May 28, 2025

Uh oh!

ywang96 left a comment

Uh oh!

ekagra-ranjan commented May 28, 2025 •

edited

Loading

Uh oh!

ywang96 commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ekagra-ranjan commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ekagra-ranjan commented May 28, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

ekagra-ranjan commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ekagra-ranjan commented May 21, 2025 •

edited by github-actions bot

Loading

ekagra-ranjan commented May 28, 2025 •

edited

Loading