Skip to content

fix(jit): propagate -DNDEBUG to host-side cflags#3278

Merged
aleozlx merged 1 commit into
flashinfer-ai:mainfrom
arpera:fix/logger-ndebug
May 11, 2026
Merged

fix(jit): propagate -DNDEBUG to host-side cflags#3278
aleozlx merged 1 commit into
flashinfer-ai:mainfrom
arpera:fix/logger-ndebug

Conversation

@arpera
Copy link
Copy Markdown
Contributor

@arpera arpera commented May 9, 2026

📌 Description

gen_jit_spec adds -DNDEBUG only to extra_cuda_cflags (consumed by nvcc for .cu files), not to extra_cflags (consumed by g++ for host-side .cpp). Several host-only translation units are part of MoE/GEMM JIT specs — most notably csrc/nv_internal/cpp/common/logger.cpp — and they end up compiled without NDEBUG while the rest of the module is a release build.

For the TensorRT-LLM logger this matters because of:

// csrc/nv_internal/include/tensorrt_llm/common/logger.h
#ifndef NDEBUG
  Level const DEFAULT_LOG_LEVEL = DEBUG;
#else
  Level const DEFAULT_LOG_LEVEL = INFO;
#endif

With NDEBUG missing on the host side, every prebuilt flashinfer-jit-cache wheel ships with Logger::level_ = DEBUG (10). On Hopper this turns each MoE forward pass into a stream of [TensorRT-LLM][DEBUG] ... sm90_generic_mixed_moe_gemm_kernelLauncher ... lines from the OSS CUTLASS kernel dispatcher. Verified by reading the data-section initializer of Logger::Logger() in the released flashinfer-jit-cache==0.6.10+cu130 fused_moe_{90,100,103,120,trtllm_sm100}.so — all five start Logger with DEFAULT_LOG_LEVEL=10 and level_=10, even though the same wheels carry no .debug_* sections (i.e. they are otherwise release-built).

The fix is one line: also append -DNDEBUG to the host cflags when not in debug mode. The flashinfer-jit-cache wheel build picks this up automatically and the prebuilt logger flips back to INFO.

🔍 Related Issues

Initially this bug was observed during integration of FI v0.6.10 into vLLM: [CI/Build] Bump flashinfer to v0.6.10 #41711.
There is a CI job log failure due to this issue: buildkite/ci/pr/distributed-tests-2-gpus-h100.

Surfaced while debugging a downstream CI step that produced a 2.9 GB log dominated by TRT-LLM debug prints from fused_moe_90.so. No FlashInfer issue tracking this yet — happy to file one alongside this PR if useful.

🚀 Pull Request Checklist

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit.
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (pytest tests/test_jit_cpp_ext.py).

Two regression tests added in tests/test_jit_cpp_ext.py, mirroring the existing test_debug_jit_uses_sccache_compatible_nvcc_device_debug_flag style:

pytest tests/test_jit_cpp_ext.py -v
test_release_jit_propagates_ndebug_to_host_cflags PASSED
test_debug_jit_does_not_propagate_ndebug          PASSED

The first asserts that a release build (FLASHINFER_JIT_DEBUG/FLASHINFER_JIT_VERBOSE unset) puts -DNDEBUG in both spec.extra_cflags and spec.extra_cuda_cflags. The second locks in symmetry: with FLASHINFER_JIT_DEBUG=1 neither list contains -DNDEBUG. Without the fix, the first test fails on assert "-DNDEBUG" in spec.extra_cflags.

Reviewer Notes

Single-line behavior change in flashinfer/jit/core.py. No effect on debug builds. Prebuilt wheels rebuilt from this commit will pick up the change automatically — no schema/version bump needed.

Summary by CodeRabbit

  • New Features

    • JIT-compiled code now includes optimized compilation flags in release mode for improved performance.
  • Tests

    • Added test coverage for proper compilation flag handling between debug and release build modes.

Review Change Stack

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97b47472-eefc-48da-b669-4da69544b382

📥 Commits

Reviewing files that changed from the base of the PR and between 0a128d1 and 00d9e08.

📒 Files selected for processing (2)
  • flashinfer/jit/core.py
  • tests/test_jit_cpp_ext.py

📝 Walkthrough

Walkthrough

This PR adds the -DNDEBUG preprocessor flag to C++ compilation flags during release-mode JIT builds. The core change is a single-line modification to gen_jit_spec that appends -DNDEBUG alongside -O3. Two new test functions validate the flag is present in release builds and absent in debug builds.

Changes

JIT Release Build Optimization

Layer / File(s) Summary
Core JIT Flag Configuration
flashinfer/jit/core.py
gen_jit_spec appends -DNDEBUG to cflags in non-debug mode alongside -O3 optimization.
Release/Debug Mode Test Coverage
tests/test_jit_cpp_ext.py
Two pytest functions verify -DNDEBUG appears in both extra_cflags and extra_cuda_cflags during release builds (when FLASHINFER_JIT_DEBUG is unset), and is absent during debug builds (when FLASHINFER_JIT_DEBUG is set).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A compiler flag we gently place,
To optimize release's pace,
With NDEBUG's swift embrace,
Debug mode leaves not a trace—
Let the fluffy rabbits race! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: propagating -DNDEBUG flag to host-side compiler flags in the JIT specification.
Description check ✅ Passed The description follows the template structure with detailed explanation of the problem, related issues, completed checklist items, tests added, and reviewer notes. All required sections are present and well-documented.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ensures that the -DNDEBUG flag is propagated to host compiler flags (cflags) during JIT compilation when not in debug mode. It also adds unit tests to verify that the flag is correctly included in release builds and excluded in debug builds. I have no feedback to provide.

Copy link
Copy Markdown
Collaborator

@aleozlx aleozlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aleozlx aleozlx added the run-ci label May 11, 2026
@aleozlx
Copy link
Copy Markdown
Collaborator

aleozlx commented May 11, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !657 has been created, and the CI pipeline #50884694 is currently running. I'll report back once the pipeline job completes.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 11, 2026

@aleozlx, CI passed, should we merge this item then?

@aleozlx aleozlx merged commit 6885e76 into flashinfer-ai:main May 11, 2026
43 of 44 checks passed
@aleozlx aleozlx added the v0.6.11 release blocker label for 0.6.11 label May 11, 2026
@aleozlx
Copy link
Copy Markdown
Collaborator

aleozlx commented May 11, 2026

added 0.6.11 label for post1 target

aleozlx pushed a commit that referenced this pull request May 12, 2026
## 📌 Description

`gen_jit_spec` adds `-DNDEBUG` only to `extra_cuda_cflags` (consumed by
`nvcc` for `.cu` files), not to `extra_cflags` (consumed by `g++` for
host-side `.cpp`). Several host-only translation units are part of
MoE/GEMM JIT specs — most notably
`csrc/nv_internal/cpp/common/logger.cpp` — and they end up compiled
without `NDEBUG` while the rest of the module is a release build.

For the TensorRT-LLM logger this matters because of:

```cpp
// csrc/nv_internal/include/tensorrt_llm/common/logger.h
#ifndef NDEBUG
  Level const DEFAULT_LOG_LEVEL = DEBUG;
#else
  Level const DEFAULT_LOG_LEVEL = INFO;
#endif
```

With `NDEBUG` missing on the host side, every prebuilt
`flashinfer-jit-cache` wheel ships with `Logger::level_ = DEBUG (10)`.
On Hopper this turns each MoE forward pass into a stream of
`[TensorRT-LLM][DEBUG] ... sm90_generic_mixed_moe_gemm_kernelLauncher
...` lines from the OSS CUTLASS kernel dispatcher. Verified by reading
the data-section initializer of `Logger::Logger()` in the released
`flashinfer-jit-cache==0.6.10+cu130`
`fused_moe_{90,100,103,120,trtllm_sm100}.so` — all five start `Logger`
with `DEFAULT_LOG_LEVEL=10` and `level_=10`, even though the same wheels
carry no `.debug_*` sections (i.e. they are otherwise release-built).

The fix is one line: also append `-DNDEBUG` to the host `cflags` when
not in debug mode. The `flashinfer-jit-cache` wheel build picks this up
automatically and the prebuilt logger flips back to `INFO`.

## 🔍 Related Issues

Initially this bug was observed during integration of FI v0.6.10 into
vLLM: [[CI/Build] Bump flashinfer to v0.6.10
#41711](vllm-project/vllm#41711).
There is a CI job log failure due to this issue:
[buildkite/ci/pr/distributed-tests-2-gpus-h100](https://buildkite.com/vllm/ci/builds/64532#019df966-e67d-4c27-af0e-76b00bc496e5).

Surfaced while debugging a downstream CI step that produced a 2.9 GB log
dominated by TRT-LLM debug prints from `fused_moe_90.so`. No FlashInfer
issue tracking this yet — happy to file one alongside this PR if useful.

## 🚀 Pull Request Checklist

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`.
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`pytest tests/test_jit_cpp_ext.py`).

Two regression tests added in `tests/test_jit_cpp_ext.py`, mirroring the
existing `test_debug_jit_uses_sccache_compatible_nvcc_device_debug_flag`
style:

```
pytest tests/test_jit_cpp_ext.py -v
```

```
test_release_jit_propagates_ndebug_to_host_cflags PASSED
test_debug_jit_does_not_propagate_ndebug          PASSED
```

The first asserts that a release build
(`FLASHINFER_JIT_DEBUG`/`FLASHINFER_JIT_VERBOSE` unset) puts `-DNDEBUG`
in **both** `spec.extra_cflags` and `spec.extra_cuda_cflags`. The second
locks in symmetry: with `FLASHINFER_JIT_DEBUG=1` neither list contains
`-DNDEBUG`. Without the fix, the first test fails on `assert "-DNDEBUG"
in spec.extra_cflags`.

## Reviewer Notes

Single-line behavior change in `flashinfer/jit/core.py`. No effect on
debug builds. Prebuilt wheels rebuilt from this commit will pick up the
change automatically — no schema/version bump needed.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* JIT-compiled code now includes optimized compilation flags in release
mode for improved performance.

* **Tests**
* Added test coverage for proper compilation flag handling between debug
and release build modes.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/flashinfer-ai/flashinfer/pull/3278)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-ci v0.6.11 release blocker label for 0.6.11

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants