fix(jit): propagate -DNDEBUG to host-side cflags by arpera · Pull Request #3278 · flashinfer-ai/flashinfer

arpera · 2026-05-09T23:05:44Z

📌 Description

gen_jit_spec adds -DNDEBUG only to extra_cuda_cflags (consumed by nvcc for .cu files), not to extra_cflags (consumed by g++ for host-side .cpp). Several host-only translation units are part of MoE/GEMM JIT specs — most notably csrc/nv_internal/cpp/common/logger.cpp — and they end up compiled without NDEBUG while the rest of the module is a release build.

For the TensorRT-LLM logger this matters because of:

// csrc/nv_internal/include/tensorrt_llm/common/logger.h
#ifndef NDEBUG
  Level const DEFAULT_LOG_LEVEL = DEBUG;
#else
  Level const DEFAULT_LOG_LEVEL = INFO;
#endif

With NDEBUG missing on the host side, every prebuilt flashinfer-jit-cache wheel ships with Logger::level_ = DEBUG (10). On Hopper this turns each MoE forward pass into a stream of [TensorRT-LLM][DEBUG] ... sm90_generic_mixed_moe_gemm_kernelLauncher ... lines from the OSS CUTLASS kernel dispatcher. Verified by reading the data-section initializer of Logger::Logger() in the released flashinfer-jit-cache==0.6.10+cu130 fused_moe_{90,100,103,120,trtllm_sm100}.so — all five start Logger with DEFAULT_LOG_LEVEL=10 and level_=10, even though the same wheels carry no .debug_* sections (i.e. they are otherwise release-built).

The fix is one line: also append -DNDEBUG to the host cflags when not in debug mode. The flashinfer-jit-cache wheel build picks this up automatically and the prebuilt logger flips back to INFO.

🔍 Related Issues

Initially this bug was observed during integration of FI v0.6.10 into vLLM: [CI/Build] Bump flashinfer to v0.6.10 #41711.
There is a CI job log failure due to this issue: buildkite/ci/pr/distributed-tests-2-gpus-h100.

Surfaced while debugging a downstream CI step that produced a 2.9 GB log dominated by TRT-LLM debug prints from fused_moe_90.so. No FlashInfer issue tracking this yet — happy to file one alongside this PR if useful.

🚀 Pull Request Checklist

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit.
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (pytest tests/test_jit_cpp_ext.py).

Two regression tests added in tests/test_jit_cpp_ext.py, mirroring the existing test_debug_jit_uses_sccache_compatible_nvcc_device_debug_flag style:

pytest tests/test_jit_cpp_ext.py -v

test_release_jit_propagates_ndebug_to_host_cflags PASSED
test_debug_jit_does_not_propagate_ndebug          PASSED

The first asserts that a release build (FLASHINFER_JIT_DEBUG/FLASHINFER_JIT_VERBOSE unset) puts -DNDEBUG in both spec.extra_cflags and spec.extra_cuda_cflags. The second locks in symmetry: with FLASHINFER_JIT_DEBUG=1 neither list contains -DNDEBUG. Without the fix, the first test fails on assert "-DNDEBUG" in spec.extra_cflags.

Reviewer Notes

Single-line behavior change in flashinfer/jit/core.py. No effect on debug builds. Prebuilt wheels rebuilt from this commit will pick up the change automatically — no schema/version bump needed.

Summary by CodeRabbit

New Features
- JIT-compiled code now includes optimized compilation flags in release mode for improved performance.
Tests
- Added test coverage for proper compilation flag handling between debug and release build modes.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

coderabbitai · 2026-05-09T23:05:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97b47472-eefc-48da-b669-4da69544b382

📥 Commits

Reviewing files that changed from the base of the PR and between 0a128d1 and 00d9e08.

📒 Files selected for processing (2)

flashinfer/jit/core.py
tests/test_jit_cpp_ext.py

📝 Walkthrough

Walkthrough

This PR adds the -DNDEBUG preprocessor flag to C++ compilation flags during release-mode JIT builds. The core change is a single-line modification to gen_jit_spec that appends -DNDEBUG alongside -O3. Two new test functions validate the flag is present in release builds and absent in debug builds.

Changes

JIT Release Build Optimization

Layer / File(s)	Summary
Core JIT Flag Configuration `flashinfer/jit/core.py`	`gen_jit_spec` appends `-DNDEBUG` to `cflags` in non-debug mode alongside `-O3` optimization.
Release/Debug Mode Test Coverage `tests/test_jit_cpp_ext.py`	Two pytest functions verify `-DNDEBUG` appears in both `extra_cflags` and `extra_cuda_cflags` during release builds (when `FLASHINFER_JIT_DEBUG` is unset), and is absent during debug builds (when `FLASHINFER_JIT_DEBUG` is set).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A compiler flag we gently place,
To optimize release's pace,
With NDEBUG's swift embrace,
Debug mode leaves not a trace—
Let the fluffy rabbits race! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: propagating -DNDEBUG flag to host-side compiler flags in the JIT specification.
Description check	✅ Passed	The description follows the template structure with detailed explanation of the problem, related issues, completed checklist items, tests added, and reviewer notes. All required sections are present and well-documented.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request ensures that the -DNDEBUG flag is propagated to host compiler flags (cflags) during JIT compilation when not in debug mode. It also adds unit tests to verify that the flag is correctly included in release builds and excluded in debug builds. I have no feedback to provide.

aleozlx

lgtm

aleozlx · 2026-05-11T06:12:56Z

/bot run

flashinfer-bot · 2026-05-11T06:14:06Z

GitLab MR !657 has been created, and the CI pipeline #50884694 is currently running. I'll report back once the pipeline job completes.

arpera · 2026-05-11T10:25:52Z

@aleozlx, CI passed, should we merge this item then?

aleozlx · 2026-05-11T21:25:28Z

added 0.6.11 label for post1 target

## 📌 Description `gen_jit_spec` adds `-DNDEBUG` only to `extra_cuda_cflags` (consumed by `nvcc` for `.cu` files), not to `extra_cflags` (consumed by `g++` for host-side `.cpp`). Several host-only translation units are part of MoE/GEMM JIT specs — most notably `csrc/nv_internal/cpp/common/logger.cpp` — and they end up compiled without `NDEBUG` while the rest of the module is a release build. For the TensorRT-LLM logger this matters because of: ```cpp // csrc/nv_internal/include/tensorrt_llm/common/logger.h #ifndef NDEBUG Level const DEFAULT_LOG_LEVEL = DEBUG; #else Level const DEFAULT_LOG_LEVEL = INFO; #endif ``` With `NDEBUG` missing on the host side, every prebuilt `flashinfer-jit-cache` wheel ships with `Logger::level_ = DEBUG (10)`. On Hopper this turns each MoE forward pass into a stream of `[TensorRT-LLM][DEBUG] ... sm90_generic_mixed_moe_gemm_kernelLauncher ...` lines from the OSS CUTLASS kernel dispatcher. Verified by reading the data-section initializer of `Logger::Logger()` in the released `flashinfer-jit-cache==0.6.10+cu130` `fused_moe_{90,100,103,120,trtllm_sm100}.so` — all five start `Logger` with `DEFAULT_LOG_LEVEL=10` and `level_=10`, even though the same wheels carry no `.debug_*` sections (i.e. they are otherwise release-built). The fix is one line: also append `-DNDEBUG` to the host `cflags` when not in debug mode. The `flashinfer-jit-cache` wheel build picks this up automatically and the prebuilt logger flips back to `INFO`. ## 🔍 Related Issues Initially this bug was observed during integration of FI v0.6.10 into vLLM: [[CI/Build] Bump flashinfer to v0.6.10 #41711](vllm-project/vllm#41711). There is a CI job log failure due to this issue: [buildkite/ci/pr/distributed-tests-2-gpus-h100](https://buildkite.com/vllm/ci/builds/64532#019df966-e67d-4c27-af0e-76b00bc496e5). Surfaced while debugging a downstream CI step that produced a 2.9 GB log dominated by TRT-LLM debug prints from `fused_moe_90.so`. No FlashInfer issue tracking this yet — happy to file one alongside this PR if useful. ## 🚀 Pull Request Checklist ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit`. - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`pytest tests/test_jit_cpp_ext.py`). Two regression tests added in `tests/test_jit_cpp_ext.py`, mirroring the existing `test_debug_jit_uses_sccache_compatible_nvcc_device_debug_flag` style: ``` pytest tests/test_jit_cpp_ext.py -v ``` ``` test_release_jit_propagates_ndebug_to_host_cflags PASSED test_debug_jit_does_not_propagate_ndebug PASSED ``` The first asserts that a release build (`FLASHINFER_JIT_DEBUG`/`FLASHINFER_JIT_VERBOSE` unset) puts `-DNDEBUG` in **both** `spec.extra_cflags` and `spec.extra_cuda_cflags`. The second locks in symmetry: with `FLASHINFER_JIT_DEBUG=1` neither list contains `-DNDEBUG`. Without the fix, the first test fails on `assert "-DNDEBUG" in spec.extra_cflags`. ## Reviewer Notes Single-line behavior change in `flashinfer/jit/core.py`. No effect on debug builds. Prebuilt wheels rebuilt from this commit will pick up the change automatically — no schema/version bump needed.  ## Summary by CodeRabbit * **New Features** * JIT-compiled code now includes optimized compilation flags in release mode for improved performance. * **Tests** * Added test coverage for proper compilation flag handling between debug and release build modes. [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/flashinfer-ai/flashinfer/pull/3278)  Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

fix(jit): propagate -DNDEBUG to host-side cflags

00d9e08

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

arpera requested review from aleozlx, bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners May 9, 2026 23:05

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

aleozlx approved these changes May 11, 2026

View reviewed changes

aleozlx added the run-ci label May 11, 2026

aleozlx merged commit 6885e76 into flashinfer-ai:main May 11, 2026
43 of 44 checks passed

arpera mentioned this pull request May 11, 2026

[CI/Build] Bump flashinfer to v0.6.11 vllm-project/vllm#41711

Open

4 tasks

aleozlx added the v0.6.11 release blocker label for 0.6.11 label May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(jit): propagate -DNDEBUG to host-side cflags#3278

fix(jit): propagate -DNDEBUG to host-side cflags#3278
aleozlx merged 1 commit into
flashinfer-ai:mainfrom
arpera:fix/logger-ndebug

arpera commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

aleozlx left a comment

Uh oh!

aleozlx commented May 11, 2026

Uh oh!

flashinfer-bot commented May 11, 2026

Uh oh!

arpera commented May 11, 2026

Uh oh!

Uh oh!

aleozlx commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arpera commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

aleozlx commented May 11, 2026

Uh oh!

flashinfer-bot commented May 11, 2026

Uh oh!

arpera commented May 11, 2026

Uh oh!

Uh oh!

aleozlx commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arpera commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading