[Bug]: Multistep with n>1 Fails #7968

robertgshaw2-redhat · 2024-08-28T19:43:18Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

🐛 Describe the bug

Launched server with:

vllm serve $MODEL --num-scheduler-steps 8

Sent the following request:

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

# Completion API
stream = False
completion = client.completions.create(
    model=model,
    prompt="A robot may not injure a human being",
    echo=False,
    n=2,
    stream=stream)

print("Completion results:")
if stream:
    for c in completion:
        print(c)
else:
    print(completion)

Got the following output:

INFO:     Finished server process [1668044]
INFO 08-28 19:29:45 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=RuntimeError('shape mismatch: value tensor of shape [2] cannot be broadcast to indexing result of shape [1, 1]')>
Traceback (most recent call last):
  File "/home/rshaw/vllm/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
    async for request_output in results_generator:
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 1050, in generate
    async for output in await self.add_request(
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 110, in generator
    raise result
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 52, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 916, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 859, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 346, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/executor/gpu_executor.py", line 178, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/.pyenv/versions/3.11.9/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/worker/worker_base.py", line 327, in execute_model
    output = self.model_runner.execute_model(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/worker/multi_step_model_runner.py", line 275, in execute_model
    output = self._base_model_runner.execute_model(frozen_model_input,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/worker/model_runner.py", line 1489, in execute_model
    output: SamplerOutput = self.model.sample(
                            ^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/model_executor/models/llama.py", line 447, in sample
    next_tokens = self.sampler(logits, sampling_metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 153, in forward
    sample_results, maybe_sampled_tokens_tensor = _sample(
                                                  ^^^^^^^^
  File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 771, in _sample
    return _sample_with_torch(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 633, in _sample_with_torch
    sampled_token_ids_tensor[long_sample_indices] = \
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [2] cannot be broadcast to indexing result of shape [1, 1]

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

SolitaryThinker · 2024-08-28T21:00:49Z

I will take a look later today

tjohnson31415 · 2024-09-17T18:17:08Z

Looks like @tdoublep encountered this issue a while ago in the context of speculative deocding and has a PR with a fix (that would need to be rebased):

I also found a couple other issues for the same crash:

robertgshaw2-redhat · 2024-09-17T18:28:15Z

cc @afeldman-nm

m-harmonic · 2024-09-27T17:56:15Z

I'm running into the same issue. Does anyone know of a workaround? We don't need best_of or use_beam_search

We can reproduce using VLLM's provided benchmark_throughput.py:

This runs ok:

python benchmarks/benchmark_throughput.py --input-len=768 --output-len=256 --model=codellama/CodeLlama-7b-hf --max-model-len=1024 --num-prompts=1 --num-scheduler-steps=2 --n=1

This crashes:

python benchmarks/benchmark_throughput.py --input-len=768 --output-len=256 --model=codellama/CodeLlama-7b-hf --max-model-len=1024 --num-prompts=1 --num-scheduler-steps=2 --n=2

The error I'm getting is:

[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1633, in execute_model
[rank0]:     output: SamplerOutput = self.model.sample(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 466, in sample
[rank0]:     next_tokens = self.sampler(logits, sampling_metadata)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 274, in forward
[rank0]:     maybe_deferred_sample_results, maybe_sampled_tokens_tensor = _sample(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 879, in _sample
[rank0]:     return _sample_with_torch(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/sampler.py", line 826, in _sample_with_torch
[rank0]:     sampled_token_ids_tensor[long_sample_indices] = \
[rank0]: RuntimeError: shape mismatch: value tensor of shape [2] cannot be broadcast to indexing result of shape [1, 1]

m-harmonic · 2024-10-02T16:48:16Z

@comaniac Hi just wondering if someone working on VLLM can provide an update on this. We want to use multi-step scheduler because the throughput is much better for our needs, however we also need to set n > 1. Simply disabling multistep in that case won't work for us. Thanks!

comaniac · 2024-10-02T17:23:43Z

Sorry we're busying with the company event (Ray Summit) until this week. Will try to find some time after the event to look into it. @SolitaryThinker could you also take a look if you got a chance?

robertgshaw2-redhat · 2024-10-02T17:28:40Z

@afeldman-nm has a WIP branch for this

m-harmonic · 2024-10-02T17:39:27Z

@afeldman-nm has a WIP branch for this

Thanks — are you referring to the branch linked above that disables the multi-step scheduler?

robertgshaw2-redhat · 2024-10-02T17:46:25Z

[Bugfix] Handle best_of>1 & use_beam_search by disabling multi-step scheduling. #8637

Yes - to avoid crashing the server.

We are not planning to support both multistep and beam search at the same time. Instead, we are working on rearchitecting vllm to have asynchronous scheduling which will accomplish the same goal as multistep for throughput performance while making it easier to support the other features

however, if you have an idea for how to do this with multistep, feel free to open up a PR

github-actions · 2025-01-01T02:05:02Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

robertgshaw2-redhat added the bug Something isn't working label Aug 28, 2024

robertgshaw2-redhat changed the title ~~[Bug]: Multistep with n>1 Failes~~ [Bug]: Multistep with n>1 Fails Aug 28, 2024

robertgshaw2-redhat assigned comaniac Aug 28, 2024

comaniac mentioned this issue Aug 28, 2024

[Core] Combine async postprocessor and multi-step #7921

Merged

afeldman-nm linked a pull request Sep 19, 2024 that will close this issue

[Bugfix] Handle best_of>1 by disabling multi-step scheduling; fail if beam search is invoked with multi-step scheduling #8637

Open

github-actions bot added the stale label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Multistep with n>1 Fails #7968

[Bug]: Multistep with n>1 Fails #7968

robertgshaw2-redhat commented Aug 28, 2024

SolitaryThinker commented Aug 28, 2024

tjohnson31415 commented Sep 17, 2024 •

edited

Loading

robertgshaw2-redhat commented Sep 17, 2024

m-harmonic commented Sep 27, 2024

m-harmonic commented Oct 2, 2024 •

edited

Loading

comaniac commented Oct 2, 2024

robertgshaw2-redhat commented Oct 2, 2024

m-harmonic commented Oct 2, 2024

robertgshaw2-redhat commented Oct 2, 2024

github-actions bot commented Jan 1, 2025

[Bug]: Multistep with n>1 Fails #7968

[Bug]: Multistep with n>1 Fails #7968

Comments

robertgshaw2-redhat commented Aug 28, 2024

Your current environment

🐛 Describe the bug

Before submitting a new issue...

SolitaryThinker commented Aug 28, 2024

tjohnson31415 commented Sep 17, 2024 • edited Loading

robertgshaw2-redhat commented Sep 17, 2024

m-harmonic commented Sep 27, 2024

m-harmonic commented Oct 2, 2024 • edited Loading

comaniac commented Oct 2, 2024

robertgshaw2-redhat commented Oct 2, 2024

m-harmonic commented Oct 2, 2024

robertgshaw2-redhat commented Oct 2, 2024

github-actions bot commented Jan 1, 2025

tjohnson31415 commented Sep 17, 2024 •

edited

Loading

m-harmonic commented Oct 2, 2024 •

edited

Loading