Fix #26037: Skip CUDA platform detection when displaying help by AbhiOnGithub · Pull Request #33550 · vllm-project/vllm

AbhiOnGithub · 2026-02-02T08:03:28Z

Description

This PR fixes issue #26037 by preventing CUDA/platform initialization when users run --help or -h flags, making help display instant instead of taking ~10 seconds.

When NEEDS_HELP is False on upstream with -h:
• pre_register_and_update(parser) runs — triggers full CUDA platform detection
• Bench command platform override runs — triggers platform detection again

With our fix, -h is properly detected, those heavy calls are skipped.
On this machine with fast Blackwell GPUs, CUDA init takes milliseconds so the wall-clock difference is tiny. But on machines where CUDA init is slow
(original issue reported ~10s), this fix eliminates that delay entirely for -h

Problem

When users run vllm serve --help or vllm --help, the command takes ~10 seconds because it triggers CUDA/platform initialization before displaying help text. This makes it difficult to quickly view command options and is especially problematic when CUDA initialization fails.

Solution

This PR implements a two-part fix:

1. Early Help Detection in `vllm/entrypoints/cli/main.py`

Check for --help or -h flags at the start of main() before any heavy initialization
Skip cli_env_setup() when displaying help
Skip platform detection for all help requests (including bench command)

2. Lazy Import in `vllm/entrypoints/utils.py`

Move from vllm.platforms import current_platform from module-level to function-level
Import only happens in get_max_tokens() where it's actually used
Prevents platform detection during CLI module imports

Changes

Modified vllm/entrypoints/cli/main.py: Added help flag detection and conditional initialization
Modified vllm/entrypoints/utils.py: Made platform import lazy
Added tests/entrypoints/test_cli_main.py: Tests for help flag behavior (5 tests)
Added tests/entrypoints/test_utils_lazy_import.py: Tests for lazy import (2 tests)

Benefits

⚡ Help commands now execute in milliseconds instead of ~10 seconds
🚫 No CUDA initialization when just viewing help
✅ Works even when CUDA fails to initialize
🔄 No breaking changes to normal command execution

Testing

All modified files compile without syntax errors
Help detection logic verified
Tests follow vLLM conventions and will run in CI

Fixes #26037

github-actions · 2026-02-02T08:03:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request effectively addresses the performance issue of running CLI commands with --help flags by deferring platform initialization. The changes in vllm/entrypoints/cli/main.py and vllm/entrypoints/utils.py are logical and well-supported by the new tests. My main feedback is to refactor a small piece of duplicated logic in main.py to improve code clarity and maintainability.

mergify · 2026-02-02T08:07:43Z

Hi @AbhiOnGithub, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

vadimkantorov · 2026-02-02T14:37:21Z

But does vllm in the help codepath import torch / attempts dlopen CUDA libraries (I think this happens as part of import torch) / attempt doing CUDA init (if any of CUDA methods are used, torch does this)?

If so, it will still take time to load them from disk, and these can be hundreds of megabytes or larger, it takes quite some time to import torch if it's not in OS disk cache

AbhiOnGithub · 2026-02-02T17:48:47Z

Hi @NickLucche , @chaunceyjiang , @aarnphm , @DarkLight1337 , @robertgshaw2-redhat
this is first fix in VLLM repo from my end , please have a look once you guys get some time :)

When users run vllm serve --help or vllm --help, the command takes ~10 seconds because it triggers CUDA/platform initialization before displaying help text.
This makes it difficult to quickly view command options and is especially problematic when CUDA initialization fails.

AbhiOnGithub · 2026-02-02T17:53:17Z

But does vllm in the help codepath import torch / attempts dlopen CUDA libraries (I think this happens as part of import torch) / attempt doing CUDA init (if any of CUDA methods are used, torch does this)?

If so, it will still take time to load them from disk, and these can be hundreds of megabytes or larger, it takes quite some time to import torch if it's not in OS disk cache

@vadimkantorov

Good point! You're absolutely right to ask about this. Let me clarify what this PR does and doesn't prevent:

What this PR prevents:

✅ Platform/CUDA initialization - The detection (which calls CUDA APIs) is now skipped for help
✅ cli_env_setup() - Environment configuration that can be slow is skipped
What might still happen:
You're correct that if any of the CLI submodule imports [vllm.entrypoints.cli.openai], etc.) transitively import torch, then import torch and its disk I/O would still occur during help display.

The impact:
The original issue #26037 specifically mentioned the ~10 second delay was due to CUDA initialization (platform detection), not torch import. The reporter noted it worked instantly when CUDA_VISIBLE_DEVICES="" was set, suggesting CUDA init was the bottleneck, not torch loading.

This PR addresses that specific bottleneck - the CUDA/platform initialization that happens during [vllm.platforms.current_platform] access.

Further optimization:
If torch import is also a significant bottleneck for help display, we could:

Lazy-load CLI subcommands (only import when that subcommand is actually invoked)
Move argument parser definitions to separate files that don't import torch
Would you like me to test the actual help execution time with this fix to verify torch import overhead isn't significant? I can measure before/after times to confirm the improvement.

AbhiOnGithub · 2026-02-02T23:49:48Z

Current Status (with this PR)

Looking at the code, the CLI module imports in main.py happen regardless of the --help flag:

These imports occur even for --help, which means:

YES - torch likely gets imported during help through the dependency chain (e.g., benchmark.main → EngineArgs → eventually torch)

What This PR Actually Fixes
This PR specifically prevents CUDA initialization (the ~10 second delay mentioned in #26037), not torch import overhead. The fix works because:

Platform detection is skipped - No current_platform access means no CUDA API calls
cli_env_setup() is skipped - Additional heavy initialization avoided
CUDA init doesn't happen - Even if torch is imported, CUDA initialization is lazy and won't trigger without actual GPU operations
The Original Issue
Issue #26037 specifically showed the delay was from CUDA initialization, not torch import, because:

Setting CUDA_VISIBLE_DEVICES="" made help instant

The 10-second delay matched CUDA init time, not torch import time (which is typically 1-3 seconds)

Further Optimization Possible
If torch import is also a bottleneck, we could make the CLI imports lazy too:

# Instead of importing at module level, defer until subcommand is selected
# This would require restructuring how subcommands are registered

vadimkantorov · 2026-02-03T13:14:26Z

I think it would be good to at least figure out if import torch is currently happening on the --help code path. If it's not happening - then no problem exists. If it is happening, then at least we know it and it can be fixed later indeed.

So if you figured it out - maybe would be great to create a new issue to track this and decide what to do...

Btw maybe a new command can be added for benchmark or "self-test" (to check import torch/flashinfer/backends etc) - or maybe it exists already (except that proper very-accurate benchmarking is hard as it needs to control for various throttling and resource sharing). So that the new installation can run this command - and maybe its structured output could be uploaded somewhere to get some real-world basic benchmark numbers from many users...

Seems benchmark command already exists as vllm bench (but doesn't seem to print json-structured/uploadable output by default) , but maybe a simpler smoke-command can be added like vllm selftest or sth similar which just verifies that import torch works, backends load and do not fail with any of the driver/shared libraries problems

AbhiOnGithub · 2026-02-03T23:10:17Z

@vadimkantorov
I investigated this and you're absolutely right - torch IS imported during --help.

Findings

The import chain during help display:

vllm/entrypoints/cli/main.py (line 28)
  → vllm/entrypoints/cli/benchmark/latency.py
    → vllm/benchmarks/latency.py  
      → vllm/engine/arg_utils.py (line 13)
        → import torch

So yes, users will experience torch import overhead (~1-3 seconds) even with this PR.

What This PR Actually Fixes

This PR specifically prevents CUDA initialization (the ~10 second delay from #26037), not torch import overhead. The original issue showed:

CUDA_VISIBLE_DEVICES="" made help instant → proved CUDA init was the bottleneck
The 10-second delay matched CUDA init time, not torch import (~1-3s)

This PR eliminates the CUDA initialization overhead by:

Skipping current_platform access (no CUDA API calls)
Skipping cli_env_setup()
Preventing CUDA initialization (even if torch is imported, CUDA init is lazy)

Next Steps

I've created issue #33741 to track optimizing the remaining torch import overhead. The solution would be to make CLI submodule imports lazy (only import when the subcommand is actually invoked, not during argument parsing).

`vllm selftest` Idea

Great suggestion! I've included it in #33741. A vllm selftest or vllm doctor command could:

Verify torch/backends load correctly
Test basic GPU/driver functionality
Output JSON diagnostics for bug reports
Provide basic benchmarking data

This would be super useful for debugging installation issues and collecting real-world performance data.

AbhiOnGithub · 2026-02-04T17:30:15Z

Hi @vadimkantorov @NickLucche @chaunceyjiang , @aarnphm , @DarkLight1337 , @robertgshaw2-redhat
Can you please review it.

mergify · 2026-02-08T07:27:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AbhiOnGithub.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-02-09T08:51:03Z

Hi @AbhiOnGithub, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

AbhiOnGithub · 2026-02-09T18:50:40Z

Hi @vadimkantorov
please suggest are we good to merge or are there any steps for me.

vadimkantorov · 2026-02-09T18:52:49Z

@AbhiOnGithub I'm not a maintainer, just the reporter of the original issue. For me all is good. Maybe after this is merged, worth creating a separate issue for discussing/tracking removing import torch from the help code path

AbhiOnGithub · 2026-02-10T18:53:58Z

@hmellor

✅ Done! I've made the following changes:

Updated NEEDS_HELP in arg_utils.py to include -h flag
Replaced all manual help checks with NEEDS_HELP throughout main.py
Refactored the code to use if not NEEDS_HELP: instead of if NEEDS_HELP: pass else:

The code is now cleaner and follows DRY principle. Thanks for the review!

hmellor · 2026-02-16T09:22:45Z

-    cli_env_setup()
+    # Only do environment setup if not showing help
+    if not needs_help:
+        cli_env_setup()


Are there any CLI args that get their default from env?

No — [cli_env_setup()] only sets VLLM_WORKER_MULTIPROC_METHOD, which isn't used as a default for any CLI argument. All CLI arg defaults come from static dataclass field values, not [os.environ.get()]

The only tangential case is [CompilationConfig.compile_cache_save_format] which reads VLLM_COMPILE_CACHE_SAVE_FORMAT via a [default_factory],

but that's a sub-field of a JSON argument ([--compilation-config], not a standalone CLI arg. Skipping [cli_env_setup()] during [--help] is safe and won't affect displayed defaults."

This isn't obvious to me. What is the conflict avoided by skipping this with showing_help is True?

Good question. The bench command block accesses platforms.current_platform.is_unspecified(), which triggers CUDA/platform detection. That detection can take ~10 seconds (or fail entirely without a GPU). When showing help, we just want to print usage info — we don't need to know the platform. Updated the comment in the latest commit to make this clearer.

This comment, and the code comment update, are not related to the code this comment thread is under.

Please provide a manual response based on your own understanding (don't use claude or whatever to respond).

Hi @russellb ,my earlier reply was actually about the bench block below, not this code. Looking at it again, theres honestly no good reason to skip cli_env_setup()

All it does is set VLLM_WORKER_MULTIPROC_METHOD=spawn in the env — its instant and dosent trigger any CUDA or platform initialization at all.

I was just being cautious and wrapped everthing in help guard without thinking about it. Removed it now so cli_env_setup() just runs unconditionally again and thanks for pointing this out.

Code was assisted by Claude not comments :P

hmellor

LGTM! Thanks for this change and for being responsive to feedback.

For future reference, if you can avoid force pushing, it would make the reviewing process easier because GitHub can show me only the changes I have not yet reviewed.

hmellor · 2026-02-17T10:48:00Z

You will need to merge from main to fix the docs build

hmellor · 2026-02-20T10:37:15Z

@@ -14,6 +14,14 @@


 def main():
+    # Check if help is requested before doing any heavy initialization


Import does not need to be lazy?

yes, here lazy import will not help

The actual cost is three heavy libraries imported eagerly:
torch ~1.3s (via vllm/init.py -> vllm.env_override)
transformers ~1.4s (via vllm.config.model -> transformers_utils.config)
fastapi+aio ~0.5s (via cli.serve -> api_server)

hmellor · 2026-02-20T10:37:24Z

+    # For 'vllm bench *': use CPU instead of UnspecifiedPlatform by default.
+    # When showing help, skip this to avoid triggering CUDA/platform init
+    # (which can take ~10s or fail without a GPU).
+    if len(sys.argv) > 1 and sys.argv[1] == "bench" and not showing_help:


Suggested change

if len(sys.argv) > 1 and sys.argv[1] == "bench" and not showing_help:

if len(sys.argv) > 1 and sys.argv[1] == "bench" and not needs_help():

Also, did you test vllm bench --help after this change? Does that still work OK?

yes , please refer this comment
#33550 (comment)

russellb · 2026-02-23T20:49:49Z

+def test_help_flag_skips_platform_detection(argv):
+    """Test that help flags don't trigger platform detection."""
+    import vllm.platforms
+
+    vllm.platforms._current_platform = None
+
+    with patch.object(sys, "argv", argv), patch.object(sys, "exit"):
+        from vllm.entrypoints.cli.main import main
+
+        with contextlib.suppress(SystemExit):
+            main()
+
+    assert vllm.platforms._current_platform is None, (
+        f"Platform should not be detected when showing help with {argv}"
+    )


I don't think this is safe. This could conflict with other tests in the same process by editing global state.

this is now replaced with 3 tests
test_needs_help_detects_help_flags
test_needs_help_returns_false_without_help_flags
test_bench_help_skips_platform_detection

having with statement

russellb · 2026-02-23T20:49:59Z

+    import vllm.platforms
+
+    # Reset platform detection state
+    vllm.platforms._current_platform = None


Same comment for this test: I don't think this is safe. This could conflict with other tests in the same process by editing global state.

yes mutating vllm.platforms._current_platform = None is unsafe global state modification.

Now uses patch.object to temporarily set _current_platform = None , the original value is automatically restored when the with block exits, so no global state leaks between tests.

using unittest.mock.patch is just as unsafe for the same reasons.

hi @russellb , what is your suggestion/recomendation , what should I use here to make it safe.

@russellb , IMHO the better approach is to run the import check in a subprocess.
This gives perfect isolation — no mutation of the current process's global state at all.

def test_utils_import_no_platform_detection(): """Importing vllm.entrypoints.utils must not trigger platform detection.""" # Run in a subprocess for full isolation — no mock / patch needed. result = subprocess.run( [ sys.executable, "-c", ";".join([ "import vllm.platforms", "import vllm.entrypoints.utils", "assert vllm.platforms._current_platform is None" ", 'Importing vllm.entrypoints.utils triggered detection'", ]), ], capture_output=True, text=True, ) assert result.returncode == 0, ( f"Subprocess failed (rc={result.returncode}):\n" f"stdout: {result.stdout}\nstderr: {result.stderr}" )

Replaced unittest.mock.patch with subprocess isolation in [test_utils_lazy_import.py]
because patch.object combined with importlib.reload was masking a real bug where importing vllm.entrypoints.utils triggered platform detection through a top-level EngineArgs import.

Fixed the root cause by making EngineArgs a lazy import inside function bodies in vllm/entrypoints/utils.py.

Kept patch.object for sys.argv in [test_cli_main.py] since patching a plain list attribute is safe.

Simplified the bench help test to a focused unit test of the guard condition instead of running the full main() entrypoint in a subprocess.
refer my latest commit

russellb · 2026-02-23T20:52:47Z

+    # For 'vllm bench *': use CPU instead of UnspecifiedPlatform by default.
+    # When showing help, skip this to avoid triggering CUDA/platform init
+    # (which can take ~10s or fail without a GPU).
+    if len(sys.argv) > 1 and sys.argv[1] == "bench" and not showing_help:


Also, did you test vllm bench --help after this change? Does that still work OK?

removing as I don't want to block this if I don't come back to it.

hmellor · 2026-02-27T07:59:29Z

 from vllm.utils.argparse_utils import FlexibleArgumentParser

 if TYPE_CHECKING:
+    from vllm.engine.arg_utils import EngineArgs


You can't do this because EngineArgs is used for more than just type checking

Hi @hmellor ,
thanks for the feedback! I've addressed all the review comments and rebased into a single clean commit. Here's what changed:

utils.py — EngineArgs is kept as a normal import (not TYPE_CHECKING), since it's used at runtime for isinstance() and constructor calls.
The only change here is moving current_platform from a module-level import to a lazy import inside get_max_tokens().

main.py — Using needs_help() inline as you suggested: if len(sys.argv) > 1 and sys.argv[1] == "bench" and not needs_help():

arg_utils.py — Converted NEEDS_HELP constant into a needs_help() function (reusable from main.py), added -h and --help=X support.

NEEDS_HELP isstill set at module level for backward compat.

otel.py — Dropped from this PR as you suggested.

Tests — Parametrized with @pytest.mark.parametrize, lazy import test uses subprocess isolation. All 10 tests pass locally, all pre-commit hooks

Scenario Upstream With this Fix

vllm serve -h → NEEDS_HELP False (broken) True (fixed)

vllm serve --help → NEEDS_HELP True True

vllm serve --help=ModelConfig → NEEDS_HELP True True

When NEEDS_HELP is False on upstream with -h:
• pre_register_and_update(parser) runs — triggers full CUDA platform detection
• Bench command platform override runs — triggers platform detection again

With this fix, -h is properly detected, those heavy calls are skipped.

mergify · 2026-03-03T15:59:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AbhiOnGithub.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-14T15:42:25Z

Hi @AbhiOnGithub, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: AbhiOnGithub <mail2abhishekgupta@gmail.com>

hmellor · 2026-03-16T11:27:03Z

+def test_bench_help_skips_platform_detection():
+    """Test that the bench guard in main() is skipped when --help is present.
+
+    The guard in main.py is:
+        if sys.argv[1] == "bench" and not showing_help
+    When showing_help is True, current_platform is never accessed for
+    the bench CPU-override, avoiding unnecessary platform detection.
+    """
+    from vllm.engine.arg_utils import needs_help
+
+    # Verify the guard: needs_help() == True means "not showing_help" is False,
+    # so the bench platform-override block is skipped.
+    with patch.object(sys, "argv", ["vllm", "bench", "--help"]):
+        assert needs_help(), "needs_help() should be True for bench --help"
+
+    # Without --help the guard would be entered
+    with patch.object(sys, "argv", ["vllm", "bench", "latency"]):
+        assert not needs_help(), "needs_help() should be False without --help"


This duplicates what is already done in the previous 2 tests?

hmellor · 2026-03-16T11:27:46Z

+@pytest.mark.parametrize(
+    "argv",
+    [
+        ["vllm", "--help"],
+        ["vllm", "serve", "--help"],
+        ["vllm", "-h"],
+        ["vllm", "bench", "--help"],
+        ["vllm", "serve", "--help=ModelConfig"],
+    ],
+)
+def test_needs_help_detects_help_flags(argv):
+    """Test that needs_help() correctly detects help flags in sys.argv."""
+    from vllm.engine.arg_utils import needs_help
+
+    # patch.object on sys.argv is safe — it's a simple list attribute
+    # with no lazy-init or side-effect machinery.
+    with patch.object(sys, "argv", argv):
+        assert needs_help(), f"needs_help() should return True for {argv}"
+
+
+@pytest.mark.parametrize(
+    "argv",
+    [
+        ["vllm", "serve", "--model", "test"],
+        ["vllm", "bench", "latency", "--model", "test"],
+        ["vllm", "collect-env"],
+    ],
+)
+def test_needs_help_returns_false_without_help_flags(argv):
+    """Test that needs_help() returns False when no help flag is present."""
+    from vllm.engine.arg_utils import needs_help
+
+    with patch.object(sys, "argv", argv):
+        assert not needs_help(), f"needs_help() should return False for {argv}"


These can be merged into

def test_needs_help(argv, expected): ...

hmellor · 2026-03-16T11:29:04Z

This test is strangely specific. Surely we just want to check that there's no vllm.platforms in the global scope, not that there is vllm.platforms in get_max_tokens?

AbhiOnGithub requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners February 2, 2026 08:03

mergify Bot added the frontend label Feb 2, 2026

gemini-code-assist Bot reviewed Feb 2, 2026

View reviewed changes

AbhiOnGithub force-pushed the fix-help-platform-detection branch from b0d760e to d18d65b Compare February 2, 2026 08:20

DarkLight1337 requested a review from mgoin February 3, 2026 05:49

AbhiOnGithub mentioned this pull request Feb 3, 2026

Optimize --help performance: Avoid torch import during help display #33741

Open

mergify Bot added needs-rebase and removed needs-rebase labels Feb 8, 2026

AbhiOnGithub force-pushed the fix-help-platform-detection branch from ad5f10f to ed2d51e Compare February 9, 2026 09:08

ProExpertProg approved these changes Feb 10, 2026

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 10, 2026

hmellor requested changes Feb 10, 2026

View reviewed changes

Comment thread vllm/entrypoints/cli/main.py Outdated

Comment thread vllm/entrypoints/cli/main.py Outdated

Comment thread vllm/entrypoints/cli/main.py Outdated

mergify Bot added the nvidia label Feb 13, 2026

github-project-automation Bot added this to NVIDIA Feb 13, 2026

github-project-automation Bot moved this to In review in NVIDIA Feb 13, 2026

AbhiOnGithub requested a review from hmellor February 16, 2026 01:01

hmellor requested changes Feb 16, 2026

View reviewed changes

AbhiOnGithub force-pushed the fix-help-platform-detection branch 2 times, most recently from e044f2d to a4ad06f Compare February 17, 2026 08:41

hmellor approved these changes Feb 17, 2026

View reviewed changes

github-project-automation Bot moved this from In review to Ready in NVIDIA Feb 17, 2026

russellb reviewed Feb 17, 2026

View reviewed changes

Comment thread vllm/entrypoints/cli/main.py Outdated

russellb reviewed Feb 17, 2026

View reviewed changes

Comment thread tests/entrypoints/test_utils_lazy_import.py Outdated

russellb requested changes Feb 17, 2026

View reviewed changes

github-project-automation Bot moved this from Ready to In review in NVIDIA Feb 17, 2026

hmellor reviewed Feb 20, 2026

View reviewed changes

russellb previously requested changes Feb 23, 2026

View reviewed changes

hmellor reviewed Feb 25, 2026

View reviewed changes

Comment thread vllm/tracing/otel.py

hmellor requested changes Feb 27, 2026

View reviewed changes

mergify Bot added the needs-rebase label Mar 3, 2026

AbhiOnGithub force-pushed the fix-help-platform-detection branch from f975ab9 to 705e0cc Compare March 14, 2026 15:38

mergify Bot removed the needs-rebase label Mar 14, 2026

skip CUDA platform detection during --help display

e83d3a9

Signed-off-by: AbhiOnGithub <mail2abhishekgupta@gmail.com>

AbhiOnGithub force-pushed the fix-help-platform-detection branch from 705e0cc to e83d3a9 Compare March 14, 2026 16:22

hmellor requested changes Mar 16, 2026

View reviewed changes

Merge branch 'main' into fix-help-platform-detection

e341dcf

Sunt-ing mentioned this pull request Jun 1, 2026

[Bugfix] Detect driver-level CUDA init before fork #44252

Open

		@@ -14,6 +14,14 @@


		def main():
		# Check if help is requested before doing any heavy initialization

	if len(sys.argv) > 1 and sys.argv[1] == "bench" and not showing_help:
	if len(sys.argv) > 1 and sys.argv[1] == "bench" and not needs_help():

Scenario	Upstream	With this Fix
`vllm serve -h` → `NEEDS_HELP`	False (broken)	True (fixed)
`vllm serve --help` → `NEEDS_HELP`	True	True
`vllm serve --help=ModelConfig` → `NEEDS_HELP`	True	True

Uh oh!

Conversation

AbhiOnGithub commented Feb 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

1. Early Help Detection in vllm/entrypoints/cli/main.py

2. Lazy Import in vllm/entrypoints/utils.py

Changes

Benefits

Testing

Uh oh!

github-actions Bot commented Feb 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Feb 2, 2026

Uh oh!

vadimkantorov commented Feb 2, 2026

Uh oh!

AbhiOnGithub commented Feb 2, 2026

Uh oh!

AbhiOnGithub commented Feb 2, 2026

Uh oh!

AbhiOnGithub commented Feb 2, 2026

Uh oh!

vadimkantorov commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AbhiOnGithub commented Feb 3, 2026

Findings

What This PR Actually Fixes

Next Steps

vllm selftest Idea

Uh oh!

AbhiOnGithub commented Feb 4, 2026

Uh oh!

mergify Bot commented Feb 8, 2026

Uh oh!

mergify Bot commented Feb 9, 2026

Uh oh!

AbhiOnGithub commented Feb 9, 2026

Uh oh!

vadimkantorov commented Feb 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AbhiOnGithub commented Feb 10, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AbhiOnGithub Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

AbhiOnGithub commented Feb 2, 2026 •

edited by github-actions Bot

Loading

1. Early Help Detection in `vllm/entrypoints/cli/main.py`

2. Lazy Import in `vllm/entrypoints/utils.py`

vadimkantorov commented Feb 3, 2026 •

edited

Loading

`vllm selftest` Idea

AbhiOnGithub Feb 20, 2026 •

edited

Loading

AbhiOnGithub Feb 24, 2026 •

edited

Loading

AbhiOnGithub Mar 14, 2026 •

edited

Loading