[Bugfix][Logprobs] Fix logprobs op to support more backend#21591
[Bugfix][Logprobs] Fix logprobs op to support more backend#21591vllm-bot merged 1 commit intovllm-project:mainfrom
Conversation
vllm/v1/sample/ops/logprobs.py
Outdated
There was a problem hiding this comment.
Evaluating current_platform.simple_compile_backend at module import time makes the backend choice static for the lifetime of the process. Consider using lazy compilation to allow backend selection based on runtime parameters.1
def _batched_count_greater_than_impl(x: torch.Tensor,
values: torch.Tensor) -> torch.Tensor:
"""Implementation of batched_count_greater_than."""
return (x > values[..., None]).count_nonzero(dim=-1)
_cached_compiled_fn = None
def batched_count_greater_than(x: torch.Tensor,
values: torch.Tensor) -> torch.Tensor:
"""
For each row in `x`, counts the number of elements that are greater than
the corresponding value in `values`.
Args:
x: A 2D tensor of shape (num_rows, num_elements).
values: A 1D tensor of shape (num_rows,).
"""
global _cached_compiled_fn
if _cached_compiled_fn is None:
from vllm.platforms import current_platform
_cached_compiled_fn = torch.compile(
dynamic=True,
backend=current_platform.simple_compile_backend
)(_batched_count_greater_than_impl)
return _cached_compiled_fn(x, values)Style Guide References
Footnotes
-
Use lazy compilation to allow backend selection based on runtime parameters. ↩
There was a problem hiding this comment.
I think using a static compile backend is enough for a specific platform.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
@houseroad Would you mind taking a look? Many thanks. |
|
@MengqingCao are doing a e2e test, we will paste e2e results after tests complete. (later) Updated on PR description, I also do a e2e test on vllm-project/vllm-ascend#1927 it's works as expected. |
Signed-off-by: MengqingCao <cmq0113@163.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: x22x22 <wadeking@qq.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…ect#21591) Signed-off-by: MengqingCao <cmq0113@163.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
This pr fixes logprobs op to support more backend, currently it only support inductor backend, which may break some hardwares.
Closes: #21592
Test Plan
Test with vllm-ascend with the following scripts:
Test Result
before this pr:
Details
Traceback (most recent call last): outputs = self._run_engine(use_tqdm=use_tqdm) File "/home/xxx/code/vllm-cpu/vllm/vllm/entrypoints/llm.py", line 1701, in _run_engine File "/home/xxx/miniconda3/envs/atb/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/xxx/miniconda3/envs/atb/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 638, in run_engine_core raise e File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 627, in run_engine_core engine_core.run_busy_loop() File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 654, in run_busy_loop self._process_engine_step() File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 679, in _process_engine_step outputs, model_executed = self.step_fn() File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 268, in step model_output = self.execute_model_with_error_logging( File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 254, in execute_model_with_error_logging raise err File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core.py", line 245, in execute_model_with_error_logging return model_fn(scheduler_output) File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/executor/abstract.py", line 87, in execute_model output = self.collective_rpc("execute_model", File "/home/xxx/code/vllm-cpu/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/home/xxx/code/vllm-cpu/vllm/vllm/utils/__init__.py", line 2986, in run_method return func(*args, **kwargs) File "/home/xxx/code/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 190, in execute_model output = self.model_runner.execute_model(scheduler_output, File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/xxx/code/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1523, in execute_model prompt_logprobs_dict = self._get_prompt_logprobs_dict( File "/home/xxx/code/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2387, in _get_prompt_logprobs_dict token_ids, logprobs, ranks = self.sampler.gather_logprobs( File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/sample/sampler.py", line 191, in gather_logprobs token_ranks = batched_count_greater_than(logprobs, token_logprobs) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn return fn(*args, **kwargs) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__ return self._torchdynamo_orig_callable( File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1064, in __call__ result = self._inner_convert( File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__ return _compile( File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner return _compile_inner(code, one_graph, hooks, transform) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_utils_internal.py", line 87, in wrapper_function return function(*args, **kwargs) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner out_code = transform_code_object(code, transform) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object transformations(instructions, code_options) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 219, in _fn return fn(*args, **kwargs) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 634, in transform tracer.run() File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2796, in run super().run() File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run while self.step(): File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2987, in RETURN_VALUE self._return(inst) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2972, in _return self.output.compile_subgraph( File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1117, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1369, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1416, in call_user_compiler return self._call_user_compiler(gm) File "/home/xxx/miniconda3/envs/atb/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e) from e torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: LoweringException: TypeError: 'NoneType' object is not callable target: aten.sum.dim_IntList args[0]: TensorBox(StorageBox( Pointwise( 'npu', torch.bool, def inner_fn(index): i0, i1 = index tmp0 = ops.load(arg2_1, i1 + i0 * s1) tmp1 = ops.load(arg3_1, i0) tmp2 = tmp0 >= tmp1 return tmp2 , ranges=[s0, s1], origin_node=ge, origins=OrderedSet([ge]) ) )) args[1]: [-1] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True step_outputs = self.llm_engine.step() File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/llm_engine.py", line 237, in step outputs = self.engine_core.get_output() File "/home/xxx/code/vllm-cpu/vllm/vllm/v1/engine/core_client.py", line 582, in get_output raise self._format_exception(outputs) from NoneAfter this pr:
Details
(Optional) Documentation Update