feat: Enable API Logging for Better Debugging POC #2108

bkryu · 2025-11-18T21:52:47Z

📌 Description

tl; dr: Current PR adds a logging system for input/output tracking to aid debugging FlashInfer APIs via a @flashinfer_api decorator.

This PR does not label @flashinfer_api to every FlashInfer API -- many operations are missing labels. Further labeling is left for subsequent work.

This PR introduces a production-ready API logging infrastructure that tracks function calls, arguments, and return values via a simple one-line decorator. Any function can be decorated with the decorator to track the input/output values in the API logger.

Key Features:

Logging level controlled by FLASHINFER_LOGLEVEL
Log destination set by FLASHINFER_LOGDEST; defaults to stdout
Zero overhead when disabled (level 0 returns original function) as seen from benchmarks/bench_logging_overhead.py

Example usage

export FLASHINFER_LOGLEVEL=1
export FLASHINFER_LOGDEST="./flashinfer_api.log"

python3 benchmarks/flashinfer_benchmark.py --routine BatchDecodeWithPagedKVCacheWrapper --backends fa2 fa2_tc cudnn trtllm-gen trtllm-gen-native --page_size 16 --batch_size 1 --s_qo 1 --s_kv 1024 --num_qo_heads 64 --num_kv_heads 8 --head_dim_qk 128 --head_dim_vo 128 --random_actual_seq_len -vv --refcheck --q_dtype bfloat16 --kv_dtype bfloat16

produces log

================================================================================
[2025-11-20 17:51:18] FlashInfer API Logging - System Information
================================================================================
FlashInfer version: 0.5.2
CUDA toolkit version: 13.0
cuDNN version: 91600
Number of GPUs: 1
  GPU 0: NVIDIA B200
    Compute capability: 10.0 (SM100)
PyTorch version: 2.9.0+cu130
================================================================================

[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.run
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.run
...

export FLASHINFER_LOGLEVEL=3 produces:

(System Info same as above)
================================================================================
[2025-11-20 17:51:58] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    <flashinfer.decode.BatchDecodeWithPagedKVCacheWrapper object at 0x1234399e3410>
  arg[1]:
    Tensor(
      shape=(134217728,)
      stride=(1,)
      dtype=torch.int8
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  arg[2]:
    'HND'
Keyword input arguments:
  use_cuda_graph=
    True
  use_tensor_cores=
    False
  paged_kv_indptr_buffer=
    Tensor(
      shape=(2,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  paged_kv_indices_buffer=
    Tensor(
      shape=(6,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  paged_kv_last_page_len_buffer=
    Tensor(
      shape=(1,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  backend=
    'fa2'
Default parameters (not explicitly provided):
  jit_args= [DEFAULT]
    None
Output value:
  None
================================================================================
...

export FLASHINFER_LOGLEVEL=5 produces:

(System Info same as above)
================================================================================
[2025-11-20 17:52:23] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    <flashinfer.decode.BatchDecodeWithPagedKVCacheWrapper object at 0x7a9fd9a88c0>
  arg[1]:
    Tensor(
      shape=(134217728,)
      stride=(1,)
      dtype=torch.int8
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=0
      mean=0.000000
    )
  arg[2]:
    'HND'
Keyword input arguments:
  use_cuda_graph=
    True
  use_tensor_cores=
    False
  paged_kv_indptr_buffer=
    Tensor(
      shape=(2,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=6
      mean=3.000000
    )
  paged_kv_indices_buffer=
    Tensor(
      shape=(6,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=5
      mean=2.500000
    )
  paged_kv_last_page_len_buffer=
    Tensor(
      shape=(1,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=4
      max=4
      mean=4.000000
    )
  backend=
    'fa2'
Default parameters (not explicitly provided):
  jit_args= [DEFAULT]
    None
Output value:
  None
================================================================================
...

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Release Notes

New Features
- Added API logging feature configurable via environment variables (FLASHINFER_LOGLEVEL for level control, FLASHINFER_LOGDEST for destination)
- Supports five verbosity levels with function names, inputs, outputs, metadata, and tensor statistics
- Zero-overhead operation when disabled
Tests
- Added comprehensive logging test suite
Documentation
- Added logging configuration and usage documentation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

gemini-code-assist · 2025-11-18T21:53:14Z

Summary of Changes

Hello @bkryu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and flexible API logging system designed to significantly enhance debugging capabilities within FlashInfer. The system allows developers to easily instrument functions with a decorator to log calls, arguments, and return values at various levels of detail, from simple function names to comprehensive tensor metadata and statistics. A key design principle is crash-safety, ensuring that input arguments are logged even if a function terminates unexpectedly, which is crucial for diagnosing complex issues like CUDA kernel crashes. Furthermore, the logging system is optimized for performance, offering zero overhead when disabled, and intelligently handles CUDA graph capture to avoid performance bottlenecks. This infrastructure will be invaluable for understanding MoE workloads and other performance-critical operations.

Highlights

Introduced API Logging System: A new, production-ready API logging infrastructure has been added to FlashInfer to track function calls, arguments, and return values.
Configurable Logging Levels: The system supports multiple logging levels (0-3) controlled by the environment variable FLASHINFER_APILOG_LEVEL, allowing for flexible verbosity from function names only to detailed tensor statistics.
Zero Overhead When Disabled: When FLASHINFER_APILOG_LEVEL is set to 0, the decorator returns the original function, ensuring no performance impact.
Crash-Safe Input Logging: Function inputs are logged before execution, preserving critical debugging information even if the function crashes (e.g., due to CUDA errors).
CUDA Graph Compatibility: Tensor statistics logging (level 3) intelligently skips during CUDA graph capture to prevent synchronization issues, logging a message instead.
Extensive Integration: The logging decorator has been applied to numerous functions and methods across cudnn, decode, fused_moe, gemm, mla, and prefill modules.
New Benchmarking Tool: A bench_logging_overhead.py script was added to measure and analyze the performance impact of different logging levels.
Comprehensive Test Coverage: A new test file test_logging.py ensures the robustness and correctness of the API logging system, covering various data types, default parameters, and crash scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-11-18T21:53:57Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

A new API logging decorator framework is introduced to FlashInfer that enables configurable logging of API calls via environment variables. The flashinfer_api decorator is implemented with support for multiple verbosity levels, tensor statistics, and CUDA graph safety. The decorator is applied to numerous public API entry points across multiple modules (decode, prefill, MLA, MOE, GEMM). Supporting infrastructure includes a benchmark script for measuring logging overhead, comprehensive tests validating behavior, and documentation.

Changes

Cohort / File(s)	Summary
Core API Logging Implementation `flashinfer/api_logging.py`	New module implementing the `flashinfer_api` decorator with environment-driven configuration (FLASHINFER_LOGLEVEL, FLASHINFER_LOGDEST), parameter introspection, tensor metadata/statistics logging, CUDA graph awareness, and crash-safe pre/post-call logging
Decode Module Instrumentation `flashinfer/cudnn/decode.py`, `flashinfer/decode.py`	Applied `@flashinfer_api` decorator to public decode functions: `cudnn_batch_decode_with_kv_cache`, `single_decode_with_kv_cache_with_jit_module`, `single_decode_with_kv_cache`, `BatchDecodeWithPagedKVCacheWrapper` methods, and various MLA/xqa variants
Prefill Module Instrumentation `flashinfer/cudnn/prefill.py`, `flashinfer/prefill.py`	Applied `@flashinfer_api` decorator to public prefill functions: `cudnn_batch_prefill_with_kv_cache`, `single_prefill_with_kv_cache_with_jit_module`, `single_prefill_with_kv_cache`, `BatchPrefillWithPagedKVCacheWrapper` and `BatchPrefillWithRaggedKVCacheWrapper` methods, and additional entry points
MLA and MOE Module Instrumentation `flashinfer/mla.py`, `flashinfer/fused_moe/core.py`	Applied `@flashinfer_api` to `BatchMLAPagedAttentionWrapper` methods and multiple MOE functions: `cutlass_fused_moe`, `trtllm_bf16_moe`, `trtllm_fp8_per_tensor_scale_moe`, `trtllm_fp8_block_scale_moe`, `trtllm_fp4_block_scale_moe`, `trtllm_fp4_block_scale_routed_moe`
GEMM Module Instrumentation `flashinfer/gemm/gemm_base.py`	Applied `@flashinfer_api` to public GEMM entry points: `mm_fp8`, `mm_fp4`, `bmm_fp8`, `gemm_fp8_nt_groupwise`, `gemm_fp8_nt_blockscaled`, groupwise/mxfp variants, `SegmentGEMMWrapper.run`, and new `tgv_gemm_sm100`
Logging Overhead Benchmark `benchmarks/bench_logging_overhead.py`	New benchmark script measuring API logging overhead via decorated vs. undecorated matrix multiplication, supporting configurable iterations, warmup, and detailed statistical reporting
Logging Tests `tests/utils/test_logging.py`	Comprehensive test suite validating decorator behavior across logging levels (0, 1, 3, 5) with scenarios covering zero-overhead mode, enum logging, parameter introspection, class methods, CUDA graphs, data type handling, and crash-safety
Documentation `README.md`, `docs/index.rst`, `docs/logging.rst`	Added API Logging section to README with environment variable setup, new `logging.rst` documentation describing levels and usage, and toctree entry in `docs/index.rst`

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant flashinfer_api
    participant Logger
    participant OriginalFunc
    participant CUDA

    User->>flashinfer_api: Call decorated function with args
    activate flashinfer_api
    
    Note over flashinfer_api: Check FLASHINFER_LOGLEVEL
    alt Level == 0 (Disabled)
        flashinfer_api->>OriginalFunc: Call directly (zero-overhead)
    else Level >= 1 (Enabled)
        flashinfer_api->>Logger: Log function name & parameters
        activate Logger
        Logger-->>flashinfer_api: 
        deactivate Logger
        
        flashinfer_api->>OriginalFunc: Call original function
        activate OriginalFunc
        OriginalFunc->>CUDA: Execute kernels
        CUDA-->>OriginalFunc: Return results
        deactivate OriginalFunc
        
        flashinfer_api->>Logger: Log return values & stats (level-dependent)
        activate Logger
        alt Level == 5 & CUDA Graph Active
            Logger-->>flashinfer_api: Skip stats, use placeholder
        else
            Logger-->>flashinfer_api: Include tensor statistics
        end
        deactivate Logger
    end
    
    flashinfer_api-->>User: Return results
    deactivate flashinfer_api

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

flashinfer/api_logging.py: Requires careful review of introspection logic, tensor statistics computation, CUDA graph detection, and crash-safety mechanisms
Decorator applications (decode.py, prefill.py, mla.py, fused_moe/core.py, gemm/gemm_base.py, cudnn/*): Repetitive pattern applications; verify no unintended signature or parameter order changes
tests/utils/test_logging.py: Extensive test coverage; verify assertion logic and environment variable handling
benchmarks/bench_logging_overhead.py: Verify benchmark methodology and cleanup logic

Areas requiring extra attention:

flashinfer/api_logging.py: CUDA graph capture detection and tensor statistics safety
flashinfer/gemm/gemm_base.py: Verify all MOE function decorators align with possible concurrent changes to signatures (see related PR #1980)
tests/utils/test_logging.py: Ensure temporary file cleanup and cross-platform path handling

Possibly related PRs

feat: autotune tile_tokens_dim in trtllm-gen MOE #1980: Concurrent modifications to MOE entry points (trtllm_fp4_block_scale_moe, etc.) may require coordination on signature and autotuning logic changes

Suggested reviewers

djmmoss
cyx-6
wenscarl
aleozlx
yzh119

Poem

🐰 Logs hop through the API with grace,
Decorating functions in their place,
From zero-cost whispers to verbose chatters,
FlashInfer's secrets now widely matters!
With CUDA graphs safe and tensors in sight,
Debugging dreams take flight! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Enable API Logging for Better Debugging POC' clearly summarizes the main change: introducing a new API logging feature for debugging purposes.
Description check	✅ Passed	The description is comprehensive and well-structured. It includes a clear summary (tl;dr), key features, usage examples, log output examples, and acknowledgment of future work scope.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a75c359 and 3bb0b73.

📒 Files selected for processing (2)

docs/index.rst (1 hunks)
docs/logging.rst (1 hunks)

✅ Files skipped from review due to trivial changes (2)

docs/logging.rst
docs/index.rst

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This PR introduces a comprehensive and well-designed API logging system for FlashInfer. The use of a decorator (@flashinfer_api_log) makes it easy to apply, and the control via environment variables (FLASHINFER_APILOG_LEVEL, FLASHINFER_APILOG_DEST) is flexible. The zero-overhead design when disabled is a key feature and is implemented correctly. The different logging levels provide a good range of verbosity for debugging, and the crash-safe logging of inputs is a great feature for diagnosing kernel crashes. The inclusion of a benchmark script and a thorough test suite is excellent.

My review focuses on improving the robustness and maintainability of the new logging module. I've identified a few areas for improvement, such as making exception handling more specific and simplifying some conditional logic.

Overall, this is a great addition to the library that will significantly improve debuggability.

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+
+
+# Read environment variables once at module load time
+_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))


The int() conversion for FLASHINFER_APILOG_LEVEL is not robust against invalid user input. If a user sets this environment variable to a non-integer string (e.g., "foo"), it will raise a ValueError and prevent the module from being imported. It would be more robust to handle this case gracefully, for example by defaulting to level 0 and perhaps logging a warning.

Suggested change

_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

try:

_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

except (ValueError, TypeError):

_API_LOG_LEVEL = 0

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+    except Exception:
+        # If we can't inspect the signature, return empty dict
+        return {}


The except Exception: block is too broad. It will catch any exception, including system-exiting exceptions like KeyboardInterrupt, which can hide bugs and make debugging difficult. The inspect.signature() function is documented to raise ValueError or TypeError on failure. It's better to catch only these specific exceptions.

Suggested change

except Exception:

# If we can't inspect the signature, return empty dict

return {}

except (ValueError, TypeError):

# If we can't inspect the signature, return empty dict

return {}

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+                    if "Wrapper" in class_name or class_name in [
+                        "BatchMLAPagedAttentionWrapper"
+                    ]:


The condition class_name in ["BatchMLAPagedAttentionWrapper"] is redundant because the class name "BatchMLAPagedAttentionWrapper" already contains the substring "Wrapper", which is checked in the first part of the or condition. This can be simplified for better readability and maintainability.

if "Wrapper" in class_name:

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

tests/utils/test_logging.py (1)

227-247: Consider quieting Ruff’s unused‑argument warnings in tests

Several test helpers intentionally don’t use all parameters (e.g., mode in test_default_parameters, arguments to crashing_function, and the many typed parameters in test_different_data_types). If Ruff is run on tests, you may want to silence these via _-prefixed names or # noqa: ARG001 on those definitions to keep the suite warning‑free; behavior is otherwise fine.

Also applies to: 322-344, 355-383

flashinfer/fused_moe/core.py (1)

23-23: MoE public API logging is wired correctly

Wrapping the high‑level MoE entrypoints (cutlass_fused_moe, trtllm_bf16_moe, all FP8/FP4 variants) with flashinfer_api_log gives good coverage of the performance‑critical surface without touching the custom‑op registration or autotuning internals. Signatures and control flow are preserved. Just be aware that at logging level 3, tensor statistics over large MoE activations/weights will be non‑trivial in cost, so that mode should remain a targeted debugging tool rather than a default.

Also applies to: 688-906, 1862-1941, 1943-2015, 2017-2093, 2095-2223, 2225-2354
flashinfer/api_logging.py (2)
28-68: Env‑driven logger setup matches goals; consider a tiny robustness tweak

Reading FLASHINFER_APILOG_LEVEL / FLASHINFER_APILOG_DEST once at import and configuring a dedicated flashinfer.api logger (with a NullHandler at level 0) cleanly achieves “zero‑overhead when disabled” and isolates logs from the root logger. One minor robustness improvement would be to guard the int() conversion for _API_LOG_LEVEL so a malformed env var falls back to level 0 (or a safe default) instead of raising at import.

345-367: Use the func_name argument in _log_function_outputs for clearer logs

_log_function_outputs accepts func_name but currently ignores it, which also triggers a linter warning. You could make the logs more self‑describing and fix the unused parameter by adding a small header including the function name. For example:
 def _log_function_outputs(func_name: str, result: Any, level: int) -> None:
@@
-    lines = []
-    # Log outputs
-    lines.append("Output value:")
+    lines = []
+    lines.append(f"FlashInfer API Return: {func_name}")
+    lines.append("Output value:")
     lines.append(_format_value(result, level, indent=1))
@@
-    _logger.debug("\n".join(lines))
+    _logger.debug("\n".join(lines))
This preserves existing tests (which only assert on "Output value:") while improving traceability.
benchmarks/bench_logging_overhead.py (1)
35-38: Align benchmark log destination with api_logging’s default to avoid confusion

The script defaults LOG_DEST to /tmp/flashinfer_benchmark_log.txt, but flashinfer.api_logging defaults FLASHINFER_APILOG_DEST to ./flashinfer_log.txt when the env var is unset. If users only set FLASHINFER_APILOG_LEVEL (as shown in the usage block), the benchmark will print and clean up /tmp/... while the actual logs go to ./flashinfer_log.txt, so log‑size reporting and cleanup can silently miss the real file.

To make behavior predictable, consider either:

Matching the default with the library:
-LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "/tmp/flashinfer_benchmark_log.txt")
+LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "./flashinfer_log.txt")
and/or

Explicitly propagating LOG_DEST into the environment before importing flashinfer.api_logging when the env var is not already set.

Either option will keep the benchmark’s “LOG FILE INFO” and cleanup in sync with where the decorator actually writes.

Also applies to: 265-327
flashinfer/gemm/gemm_base.py (3)
1842-2002: Decorator stacking on mm_fp4 means logging happens after backend/shape checks

mm_fp4 is now decorated as:
@backend_requirement(...)
@flashinfer_api_log
def mm_fp4(...):
    ...
This order implies that backend and problem-size checks (and any heuristic backend selection) run first, and only if they pass does the call enter the logging wrapper and function body. That’s a reasonable choice and doesn’t affect correctness, but it does mean invalid/problematic calls rejected by backend_requirement won’t appear in API logs.

If you’d prefer to log all attempted API calls—including ones failing backend requirements—you might want to reverse the order:
@flashinfer_api_log
@backend_requirement(...)
def mm_fp4(...):
    ...
Please double‑check the implementation of backend_requirement to ensure it doesn’t rely on attributes on the original function that could be hidden by another wrapper (though functools.wraps in the logger should preserve most metadata).

2100-2189: bmm_fp8 logging is consistent; same decorator-order caveat as mm_fp4

bmm_fp8 is also defined with:
@backend_requirement(...)
@flashinfer_api_log
def bmm_fp8(...):
    ...
So backends/requirements are checked before logging, and only successful calls will be logged. The attributes added by backend_requirement (e.g., suitable_auto_backends) still attach to the outer wrapper, so the internal use of bmm_fp8.suitable_auto_backends remains valid.

If you want logs for calls that are rejected due to unsupported compute capability or shapes, consider swapping decorator order as suggested for mm_fp4, and verify that backend_requirement remains compatible with being under the logging wrapper.

2501-2527: Potential double-logging when calling gemm_fp8_nt_blockscaled

gemm_fp8_nt_blockscaled is a thin wrapper around gemm_fp8_nt_groupwise, and both are decorated with @flashinfer_api_log. A call to gemm_fp8_nt_blockscaled will therefore emit two log entries: one for gemm_fp8_nt_blockscaled and one for the inner gemm_fp8_nt_groupwise call.

This might be desirable (showing both the high-level alias and the underlying primitive), but could also add noise to logs.

If you want only a single log entry per API call here, you could either:

Drop the decorator from gemm_fp8_nt_blockscaled, relying on the inner gemm_fp8_nt_groupwise logging, or

Keep only the alias decorated and undeco gemm_fp8_nt_groupwise if it’s considered an internal helper.

Please choose based on how you expect users to call these APIs directly versus via wrappers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c4b522 and 92c15f7.

📒 Files selected for processing (10)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
flashinfer/cudnn/decode.py (2 hunks)
flashinfer/cudnn/prefill.py (2 hunks)
flashinfer/decode.py (10 hunks)
flashinfer/fused_moe/core.py (7 hunks)
flashinfer/gemm/gemm_base.py (12 hunks)
flashinfer/mla.py (4 hunks)
flashinfer/prefill.py (11 hunks)
tests/utils/test_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (9)

flashinfer/mla.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api_log (369-462)

decorator (418-457)

wrapper (420-455)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/fused_moe/core.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/cudnn/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/cudnn/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/gemm/gemm_base.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

🪛 Ruff (0.14.5)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

346-346: Do not catch blind exception: Exception

(BLE001)

flashinfer/api_logging.py

161-161: Do not catch blind exception: Exception

(BLE001)

225-225: Use explicit conversion flag

Replace with conversion flag

(RUF010)

236-236: Use explicit conversion flag

Replace with conversion flag

(RUF010)

240-240: Use explicit conversion flag

Replace with conversion flag

(RUF010)

241-241: Do not catch blind exception: Exception

(BLE001)

284-284: Consider moving this statement to an else block

(TRY300)

285-285: Do not catch blind exception: Exception

(BLE001)

345-345: Unused function argument: func_name

(ARG001)

369-369: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

430-431: try-except-pass detected, consider logging the exception

(S110)

430-430: Do not catch blind exception: Exception

(BLE001)

441-441: Do not catch blind exception: Exception

(BLE001)

442-442: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

452-452: Do not catch blind exception: Exception

(BLE001)

453-453: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (36)

flashinfer/cudnn/prefill.py (1)

6-6: Decorator wiring for cuDNN prefill entrypoint looks correct

Importing flashinfer_api_log and wrapping cudnn_batch_prefill_with_kv_cache is consistent with the rest of the API surface; it doesn’t touch the cudnn-graph/jit internals and preserves function signature and control flow. Logging remains opt‑in via env level.

Also applies to: 387-543

flashinfer/mla.py (1)

22-22: MLA wrapper instrumentation is aligned with the logging design

Decorating BatchMLAPagedAttentionWrapper.__init__, plan, and run with flashinfer_api_log cleanly instruments the main public lifecycle without altering behavior. The class name pattern ensures logs are emitted as BatchMLAPagedAttentionWrapper.<method>, which is useful for traceability, and zero‑overhead semantics hold when logging is disabled at import time.

Also applies to: 133-203, 204-305, 337-451

flashinfer/cudnn/decode.py (1)

6-6: cuDNN decode entrypoint is correctly wrapped for logging

The flashinfer_api_log import and decorator on cudnn_batch_decode_with_kv_cache match the prefill pattern, leave the cuDNN graph construction untouched, and preserve the API contract. This is a safe, minimal instrumentation point.

Also applies to: 256-350

tests/utils/test_logging.py (1)

43-82: Comprehensive logging decorator tests look solid

The autouse fixture plus setup_logging pattern gives each test a clean environment and forces flashinfer.api_logging to re‑read env vars, which matches how the decorator is intended to be configured. The suite covers level semantics, enums, defaults vs explicit kwargs, crash‑safety, CUDA tensors (incl. graph capture), class methods, and multiple invocation patterns; all expectations are consistent with the current implementation in flashinfer/api_logging.py.

Also applies to: 83-585

flashinfer/api_logging.py (2)

70-163: Value formatting and CUDA‑graph‑aware tensor stats look well thought out

_format_value’s handling of enums, tensors (with shape/stride/device metadata), FP4Tensor, and nested containers is consistent and log‑friendly. The level‑3 path correctly avoids statistics during CUDA graph capture via torch.cuda.is_current_stream_capturing and gracefully degrades to a “[statistics error: …]” line on failures. This matches the intended “rich when needed, safe when it fails” design.

369-462: Decorator implementation aligns with zero‑overhead and crash‑safety requirements

flashinfer_api_log correctly returns the original function when _API_LOG_LEVEL == 0 at import/decorator time, avoiding any wrapper call‑overhead in the disabled case. For enabled levels, the wrapper’s pre‑call logging (with class‑qualified names for *Wrapper classes) and post‑call logging, both protected by broad try/except blocks, achieve the “crash‑safe, best‑effort logging” goal without risking user code execution. The decorator also supports both @flashinfer_api_log and @flashinfer_api_log() usage, which matches the tests.

flashinfer/prefill.py (11)

25-25: API logger import is correctly scoped and consistent with other modules

The relative import of flashinfer_api_log mirrors how other utilities are imported and avoids circular dependencies; no issues here.

877-910: Decorating single_prefill_with_kv_cache_with_jit_module is safe and aligns with the logging design

Applying @flashinfer_api_log here cleanly instruments this top-level helper: it only wraps the JIT module’s run call and does not interfere with caching or custom-op registration elsewhere.

If you expect this function to be called inside torch.compile or CUDA graphs, please sanity‑check that logging levels ≥2 are disabled in those contexts to avoid unnecessary extra kernels for statistics.

912-985: Logging single_prefill_with_kv_cache covers the primary single-request prefill API (including the alias)

Instrumenting single_prefill_with_kv_cache is appropriate: it is the main single-prefill entry point, and the alias single_prefill_with_kv_cache_return_lse (via functools.partial) will also be logged because it refers to the decorated function.

Please confirm that your log level defaults (e.g., env FLASHINFER_APILOG_LEVEL) are set so that heavy workloads don’t accidentally incur level‑3 tensor statistics on this hot path in production.

1331-1498: Wrapper __init__ logging for BatchPrefillWithPagedKVCacheWrapper looks reasonable

Adding @flashinfer_api_log on the constructor is useful for debugging misconfigurations (workspace size, CUDA‑graph buffers, backend selection) and doesn’t affect runtime behavior, since allocations and checks were already here.

It may be worth running a small CUDA‑graph flow that constructs this wrapper in graph-enabled mode to ensure that level‑2/3 logging remains acceptable during initialization (even if execution itself happens inside graphs).

1527-1925: Logging BatchPrefillWithPagedKVCacheWrapper.plan is helpful; watch for overhead at high log levels

Decorating plan aligns with the goal of capturing problem setup (indptrs, masks, backends) and should not affect correctness, since the body is unchanged and the decorator is pure Python around it.

Because plan copies indptrs to CPU and computes derived arrays, logging at level‑3 (with statistics) on very large batches can add noticeable overhead. Ensure that in tight autotuning or repeated planning loops you keep FLASHINFER_APILOG_LEVEL at 0–1 unless you explicitly need detailed introspection.

1984-2220: Instrumenting BatchPrefillWithPagedKVCacheWrapper.run is consistent and appears safe

Applying @flashinfer_api_log to run gives visibility into the main prefill execution path (including PDL, backend selection, and output tensor shapes) while leaving the core kernel launch logic unchanged. The alias run_return_lse also goes through the decorated method, so both result variants are logged.

For very latency‑sensitive use (e.g., high‑QPS decode), you may want to benchmark with FLASHINFER_APILOG_LEVEL=1/2/3 to validate that the added logging—especially level‑3 statistics—meets your overhead budget.

2359-2474: Constructor logging for BatchPrefillWithRaggedKVCacheWrapper matches the paged wrapper behavior

Decorating __init__ here mirrors the paged wrapper: creation of workspace buffers, CUDA‑graph buffers, and backend selection are now log-visible without touching the execution logic.

As with the paged wrapper, it’s worth verifying that constructing this wrapper inside any higher‑level tooling (e.g., model factories) remains acceptable when logging is enabled, since constructor logs can be noisy if wrappers are created per‑request.

2503-2794: Logging BatchPrefillWithRaggedKVCacheWrapper.plan is appropriate and symmetric with paged prefill

The @flashinfer_api_log decorator on plan for ragged KV behaves analogously to the paged variant, exposing shapes, dtypes, and backend choices. The computational path (host transfers, planning, cached module setup) remains unchanged.

Same suggestion as for the paged plan: if you run many small plan calls (e.g., in tuning or dynamic workloads), keep logging levels conservative to avoid overhead from repeated detailed summaries.

2848-2995: BatchPrefillWithRaggedKVCacheWrapper.run logging cleanly wraps the main ragged prefill execution

Instrumenting run makes sense: it’s the hot execution entry point and logs will now include mask mode, backend, and output shapes, which are valuable for debugging MoE / ragged scenarios. No functional changes are introduced.

Consider adding this method to any logging tests you already have (similar to the decode wrappers) to confirm that level‑2/3 input summaries behave as expected with ragged indptr layouts.

3205-3337: trtllm_ragged_attention_deepseek is a good candidate for API logging

The decorator around this Triton/TRT‑LLM‑style ragged attention wrapper will expose sequence length, scaling, and PDL settings without affecting the underlying kernel invocation. Given this is a specialized path, logging is particularly useful here.

Because this function is often used in brittle integration scenarios, you may want to add a small smoke test that calls it with logging level 2–3 to ensure tensor summaries do not assume contiguous layouts beyond what the implementation already guarantees.

3340-3553: Logging trtllm_batch_context_with_kv_cache is consistent with other TRTLLM interfaces

Adding @flashinfer_api_log here gives observability into FP4/FP8 output configuration, kv layout, and PDL usage. Since the decorator is a thin wrapper and all runtime behavior (including FP4 tensor handling) is unchanged, this looks safe.

One thing to double‑check is that when out is an FP4Tensor, the logging layer’s input summarization doesn’t accidentally try to introspect internal fields in a way that could be overly verbose; if so, you might consider teaching the logger a lightweight summary for FP4Tensor.
flashinfer/gemm/gemm_base.py (9)
25-25: Importing flashinfer_api_log here is consistent with other modules

The relative import from ..api_logging is correct for this package layout and keeps GEMM logging centralized alongside other instrumented APIs.

914-1084: SegmentGEMMWrapper.run instrumentation looks correct and non-intrusive

Decorating SegmentGEMMWrapper.run adds valuable logging around a complex segmented GEMM path (seg_lens/seg_indptr, backend choice, shapes) without altering the kernel launch logic. The forward = run alias will also be logged since it resolves to the decorated method.

Given this is a potentially hot path in MoE workloads, you may want to benchmark with logging enabled at levels 2–3 to measure overhead from argument summarization on large x/weights.

1573-1682: mm_fp8 logging is well-placed on this primary FP8 GEMM entry point

The decorator wraps only the public Python API; core work is still delegated to trtllm_low_latency_gemm. No interactions with autotune or backend selection are altered.

Since mm_fp8 can be called in tight loops, please confirm via the new bench_logging_overhead.py benchmark that your target usage still meets latency goals at different logging levels.

2192-2356: Logging gemm_fp8_nt_groupwise is appropriate; be mindful of level‑3 cost on large tiles

This function is a central entry point for FP8 groupwise GEMM on Blackwell; adding @flashinfer_api_log is aligned with the overall API observability goal. The underlying CUTLASS/TRTLLM calls and device checks are unchanged.

Because this path typically operates on large (M,N,K) blocks and may be used in performance‑critical contexts, it’s worth evaluating level‑3 logging separately—tensor statistics over large matrices can add non‑trivial overhead even if the core compute kernels are unchanged.

2531-2690: group_gemm_fp8_nt_groupwise logging is consistent with other group GEMM entry points

Applying @flashinfer_api_log here exposes group sizes, scale layouts, and hardware constraints for grouped FP8 GEMM. The function’s internal device checks and workspace handling are unchanged.

Since this path asserts several alignment constraints (e.g., n % 8 == 0, k % 16 == 0), logging invalid calls at level ≥1 should be useful in diagnosing misconfigurations—ensure your logging level in debug runs is at least 1 to capture these.

2693-2825: Logging group_gemm_mxfp8_mxfp4_nt_groupwise is appropriate for this specialized MXFP4 path

The decorator wraps a fairly specialized Blackwell-only path that mixes FP8 activations with MXFP4 weights. Logging call shapes, tile sizes, and swap_ab will be valuable in debugging, and the underlying SM100 module calls are preserved.

Given the stricter assertions (tile sizes, alignment, dtype constraints), consider adding a small unit test that calls this function with logging enabled to ensure the logger handles packed MXFP4 tensors (uint8) and scale tensors without excessive verbosity.

2861-2989: group_deepgemm_fp8_nt_groupwise logging makes DeepGEMM MoE flows more observable

Instrumenting this DeepGEMM-based grouped FP8 GEMM call will surface group assignments (m_indices), scale granularities, and output sizes without altering the call to m_grouped_fp8_gemm_nt_contiguous. The architecture check remains in place.

Because this function is likely used in MoE expert-routing paths, you may want to validate that logging large m_indices tensors at level ≥2 is summarized compactly enough (e.g., only shape/type) to keep logs readable.

2992-3125: batch_deepgemm_fp8_nt_groupwise logging mirrors the grouped DeepGEMM path

The @flashinfer_api_log decorator here complements group_deepgemm_fp8_nt_groupwise, covering the batched FP8 DeepGEMM case. The masking logic (masked_m, expected_m) and backend call remain unchanged.

As with the grouped variant, confirm that logging of masked_m and expected_m behaves as expected in your tests—these values are key for understanding performance characteristics and correctness in partially‑filled batches.

568-641: Remove SM110 from the docstring—the underlying C++ implementation only supports SM100 and SM103

The runtime check is correct. Analysis of the C++ source reveals:

csrc/tgv_gemm.cu includes cutlass/gemm/collective/builders/sm100_common.inl and uses SM100-specific tcgen05.mma instructions

gen_tgv_gemm_sm10x_module is documented as "Generate TGV GEMM module for SM100 architecture"

No SM110 support path exists in the TGV GEMM implementation

The test file correctly restricts to ["100", "103"]

The docstring at lines 573–580 incorrectly claims SM110 support. It should be corrected to:
Requires SM100, SM103 architecture
Adding SM110 to the runtime check as suggested would enable an unsupported architecture.

Likely an incorrect or invalid review comment.
flashinfer/decode.py (10)

24-24: LGTM - API logging import added.

The import is correctly placed and follows the project's import organization.

316-346: API logging decorator correctly applied.

The decorator is properly applied to this public API function. As a POC, this demonstrates the instrumentation approach. Before production use, verify that the logging overhead at levels 1-3 is acceptable for your performance requirements.

393-410: Correct decorator placement on overloaded function.

The decorator is correctly applied to the implementation function rather than the overload signatures. This is the proper pattern for decorated overloaded functions.

652-664: API logging decorator correctly applied to constructor.

The decorator is properly applied to the class constructor. The decorator implementation correctly detects and includes the class name in logs.

816-840: API logging decorator correctly applied to plan method.

The decorator is properly applied. Note that at log level 2+, this method will generate verbose logs due to the many parameters. This is expected behavior for API logging and helps with debugging.

1170-1186: Correct decorator placement on overloaded method.

The decorator is correctly applied to the implementation method rather than the overload signatures. The decorator will properly capture the *args parameter for logging.

2068-2089: API logging decorator correctly applied.

The decorator is properly applied to this public API function. The extensive parameters and complex logic make this a good candidate for API logging to aid debugging.

2347-2364: API logging decorator correctly applied.

The decorator is properly applied to this public API function.

2539-2556: API logging decorator correctly applied.

The decorator is properly applied to this MLA-specific decode function.

2707-2724: API logging decorator correctly applied.

The decorator is properly applied to this function. Overall, the API logging instrumentation in this file is consistent and well-implemented.

POC Assessment: The decorator applications throughout this file demonstrate a clean, non-invasive approach to API logging. The implementation maintains zero overhead when disabled and preserves function signatures and behavior. Before moving beyond POC, consider:

Verify benchmark results show acceptable overhead at log levels 1-3

Consider adding configuration to selectively enable/disable logging for specific functions

Document the logging format and levels for end users

Based on the PR description, benchmarks exist at benchmarks/bench_logging_overhead.py. Please verify the benchmark results demonstrate acceptable performance characteristics at different log levels, particularly for hot-path functions like BatchDecodeWithPagedKVCacheWrapper.run.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

flashinfer/api_logging.py (3)
338-362: Narrow _get_default_params exception handling to the documented cases

inspect.signature is documented to raise TypeError / ValueError on failure; catching every Exception here is broader than necessary and makes debugging real failures harder. The earlier review comment already pointed this out.

You can keep the fail‑safe behavior but tighten the handler:
-    except Exception:
-        # If we can't inspect the signature, return empty dict
-        return {}
+    except (TypeError, ValueError):
+        # If we can't inspect the signature, return empty dict
+        return {}
This still preserves the “logging must not crash user code” guarantee.

493-528: Simplify class-name check and improve logging of internal logging errors

Two small points around the decorator:

The condition if "Wrapper" in class_name or class_name in ["BatchMLAPagedAttentionWrapper"]: is redundant because "BatchMLAPagedAttentionWrapper" already matches the "Wrapper" substring. This was noted in a previous review.

In the pre/post logging try blocks you catch Exception and log via _logger.error(...), which drops the traceback. Given this is a debugging/logger subsystem, having the traceback via logging.exception is usually more helpful and you’re already isolating it from user code.

Possible refactor:
-                    class_name = args[0].__class__.__name__
-                    if "Wrapper" in class_name or class_name in [
-                        "BatchMLAPagedAttentionWrapper"
-                    ]:
-                        func_name = f"{class_name}.{func_name}"
+                    class_name = args[0].__class__.__name__
+                    if "Wrapper" in class_name:
+                        func_name = f"{class_name}.{func_name}"
                 except Exception:
                     pass
@@
-            try:
+            try:
                 if _API_LOG_LEVEL == 1:
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (pre-execution)]", func_name
+                )
@@
-            try:
+            try:
                 if _API_LOG_LEVEL >= 2:
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (outputs)]", func_name
+                )
This keeps user‑visible behavior the same while making logger failures easier to diagnose.

29-31: Make FLASHINFER_APILOG_LEVEL parsing robust to invalid env values

Right now int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) will raise ValueError (and break import) if the env var is set to a non‑integer string. It’s safer to treat invalid values as “0” (disabled) so logging config can never prevent FlashInfer from importing.

You can keep the behavior while hardening parsing like this:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))
+try:
+    _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))
+except (TypeError, ValueError):
+    # Invalid user input; fall back to level 0 (disabled)
+    _API_LOG_LEVEL = 0
Please rerun the existing logging tests (and any import-time tests) after this change.

🧹 Nitpick comments (2)

flashinfer/api_logging.py (2)
145-304: Optional: guard against recursive structures in _format_value

_format_value walks lists/tuples/dicts recursively and will recurse indefinitely on cyclic structures (e.g., a list containing itself), which can appear in complex call graphs. In a debugging logger this is usually rare but when it happens it’ll raise RecursionError from the logging path.

If you want to harden this, consider adding a seen: set[int] parameter (or a max-depth cut‑off) and short‑circuit when id(value) is already in seen. No need to change now if your call sites never pass cyclic objects, but it’s something to keep in mind.

420-441: Use func_name in _log_function_outputs (or drop it) to fix the unused arg and improve logs

func_name is currently unused in _log_function_outputs, which triggers the linter and also makes the output block a bit less self‑describing when skimming logs.

A small tweak makes the output clearer and fixes ARG001:
-    # Log outputs
-    lines.append("Output value:")
+    # Log outputs
+    lines.append(f"Output value for {func_name}:")
This keeps the existing separators and should be low‑risk, but please re-run the logging tests because they may assert on exact strings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92c15f7 and c049bbf.

📒 Files selected for processing (1)

flashinfer/api_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

🪛 Ruff (0.14.5)

flashinfer/api_logging.py

86-86: Do not catch blind exception: Exception

(BLE001)

106-106: Do not catch blind exception: Exception

(BLE001)

124-124: Do not catch blind exception: Exception

(BLE001)

132-132: Do not catch blind exception: Exception

(BLE001)

236-236: Do not catch blind exception: Exception

(BLE001)

300-300: Use explicit conversion flag

Replace with conversion flag

(RUF010)

311-311: Use explicit conversion flag

Replace with conversion flag

(RUF010)

315-315: Use explicit conversion flag

Replace with conversion flag

(RUF010)

316-316: Do not catch blind exception: Exception

(BLE001)

359-359: Consider moving this statement to an else block

(TRY300)

360-360: Do not catch blind exception: Exception

(BLE001)

420-420: Unused function argument: func_name

(ARG001)

444-444: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

505-506: try-except-pass detected, consider logging the exception

(S110)

505-505: Do not catch blind exception: Exception

(BLE001)

516-516: Do not catch blind exception: Exception

(BLE001)

517-517: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

527-527: Do not catch blind exception: Exception

(BLE001)

528-528: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

coderabbitai · 2025-11-18T22:20:05Z

flashinfer/api_logging.py

+def _setup_logger():
+    """Set up the logger based on environment variables."""
+    if _API_LOG_LEVEL == 0:
+        # Completely disable logging for zero overhead
+        _logger.addHandler(logging.NullHandler())
+        _logger.setLevel(logging.CRITICAL + 1)  # Higher than any level
+        return
+
+    # All enabled levels use loggging.DEBUG; verbosity is controlled by FLASHINFER_APILOG_LEVEL instead
+    _logger.setLevel(logging.DEBUG)
+
+    # Remove any existing handlers
+    _logger.handlers.clear()
+
+    # Create handler based on destination
+    if _API_LOG_DEST == "stdout":
+        handler = logging.StreamHandler(sys.stdout)
+    elif _API_LOG_DEST == "stderr":
+        handler = logging.StreamHandler(sys.stderr)
+    else:
+        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+
+    # Use a simple formatter (we'll format the detailed content ourselves)
+    formatter = logging.Formatter("%(message)s")
+    handler.setFormatter(formatter)
+
+    _logger.addHandler(handler)
+    _logger.propagate = False  # Don't propagate to root logger
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Avoid import-time failures when FLASHINFER_APILOG_DEST is invalid

logging.FileHandler(_API_LOG_DEST, mode="a") will raise (e.g., bad path, permission) at import time and prevent FlashInfer from loading when logging is enabled. For a “debugging only” facility this is a bit too fragile; better to degrade to stderr (or a NullHandler) instead of failing imports.

A minimal way to harden this while keeping semantics:

- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )

Afterward, it’d be good to verify behavior by setting FLASHINFER_APILOG_DEST to an unwritable path and importing flashinfer.api_logging in a small repro.

🏁 Script executed:

#!/bin/bash # Find and examine the full flashinfer/api_logging.py file fd -t f "api_logging.py" --exec cat {} \;

Length of output: 19059

🏁 Script executed:

#!/bin/bash # Check when _setup_logger() is called (module level vs function level) rg -n "_setup_logger" -A2 -B2

Length of output: 462

🏁 Script executed:

#!/bin/bash # Search for any existing error handling around logging setup rg -n "except|try|OSError" flashinfer/api_logging.py

Length of output: 594

Add error handling to prevent import-time failures from invalid logging paths

The current code will break module imports if FLASHINFER_APILOG_DEST points to an invalid or unwritable path. Since _setup_logger() is called at module load time (line 67), any OSError from logging.FileHandler() will prevent FlashInfer from loading entirely. For a debugging facility, this is too fragile.

Add error handling to gracefully degrade to stderr:

# Create handler based on destination; never let a bad path break import + try: - # Create handler based on destination - if _API_LOG_DEST == "stdout": + if _API_LOG_DEST == "stdout": handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": + elif _API_LOG_DEST == "stderr": handler = logging.StreamHandler(sys.stderr) - else: + else: handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )

🤖 Prompt for AI Agents

In flashinfer/api_logging.py around lines 36 to 64, creating a FileHandler from an invalid or unwritable FLASHINFER_APILOG_DEST can raise OSError at import time and crash the module; wrap the handler creation in a try/except that catches OSError (and IOException/OSError subclasses), and on exception fall back to a logging.StreamHandler(sys.stderr) so the module still loads; also emit a non-raising warning message via the selected handler or a temporary NullHandler so the failure is recorded without propagating the exception, and ensure the rest of the logger setup (formatter, addHandler, propagate) runs regardless of whether the file handler succeeded.

yzh119

This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?

bkryu · 2025-11-19T18:38:24Z

This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?

Thanks @yzh119. This is a question I considered, more from a "should we log what happens inside each function call because the current design only logs inputs and outputs" point of view. My initial sentiment is that logging kernel behavior will be tricky because we have multiple kernel providers -- some of which are closed source and others are OSS, and some that already come with logging:

cuDNN and cuBLAS has their own respective API logging settings for logging kernel and kernel selecting heuristic behavior. For these backends, it is best if we use their API logs to debug them.
TRT-LLM appears to have something but I am not sure whether it has the ability to log kernel-level info useful for debugging.
FlashInfer-native kernels (CUDA and CuTe DSL) do not have a logging infrastructure and will require us to implement something.
That being said, if we believe that logging kernel behavior is useful, I agree that we can look into it

yzh119 · 2025-11-19T19:58:11Z

@xslingcn @Ubospica @YiyanZhai @zanderjiang this PR should greatly reducing the overhead of extracting workloads for flashinfer-bench project.

Ubospica · 2025-11-19T20:18:10Z

@yzh119 Thanks for sharing! The low-overhead design looks very useful. We definitely have plans to support low-overhead dump and CUDA graph support in our roadmap. I will leverage this PR as a key reference.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

flashinfer/api_logging.py (2)
43-47: Make FLASHINFER_LOGLEVEL_DBG parsing robust to invalid values

As written, int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) will raise ValueError at import time if a user sets the env var to a non‑integer (e.g. "foo"), preventing flashinfer.api_logging (and any API importing it) from loading. For a debug‑only facility this is too fragile; it should gracefully fall back to level 0.

You can harden this by catching parse errors and defaulting to 0 (optionally emitting a warning to stderr):
-# Read environment variables once at module load time
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
-_API_LOG_DEST = _substitute_process_id(
-    os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout")
-)
+# Read environment variables once at module load time
+_raw_log_level = os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")
+try:
+    _API_LOG_LEVEL = int(_raw_log_level)
+except (TypeError, ValueError):
+    # Fall back to level 0 on invalid input instead of failing import
+    _API_LOG_LEVEL = 0
+    print(
+        f"[flashinfer.api_logging] Invalid FLASHINFER_LOGLEVEL_DBG={_raw_log_level!r}, "
+        "falling back to 0",
+        file=sys.stderr,
+    )
+_API_LOG_DEST = _substitute_process_id(
+    os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout")
+)
This keeps existing semantics for valid values while avoiding import‑time crashes on bad configuration.

85-112: Prevent import-time failures when FLASHINFER_LOGDEST_DBG points to an invalid path

_setup_logger() calls logging.FileHandler(_API_LOG_DEST, mode="a") unguarded whenever _API_LOG_DEST is neither "stdout" nor "stderr". If the env var points to an invalid or unwritable location, this raises OSError at import time and breaks all FlashInfer imports when logging is enabled.

For a debugging feature, it’s safer to degrade to stderr instead of failing import. For example:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This preserves current behavior for valid paths but avoids hard failures when the destination is misconfigured.

🧹 Nitpick comments (3)

tests/utils/test_logging.py (1)

227-228: Address Ruff ARG001 unused-argument warnings in test helpers (optional)

Several test helper functions intentionally ignore some parameters (mode in test_default_parameters, x/y in crashing_function, and the various *_val parameters in test_different_data_types), which Ruff flags as ARG001. If your CI runs Ruff on tests, you can silence these by either:

Prefixing unused parameters with _ (e.g., _mode, _int_val, …), or

Adding # noqa: ARG001 on the relevant function definitions.

This keeps the tests behaviorally identical while satisfying the linter.

Also applies to: 322-323, 355-365
flashinfer/api_logging.py (2)
498-519: _log_function_outputs’s func_name is unused — consider wiring it into the log line

_log_function_outputs accepts func_name but never uses it, which is slightly confusing and flagged by linters as ARG001. You can either drop the parameter, or (more usefully) include it in the header line:
-    lines = []
-    # Log outputs
-    lines.append("Output value:")
+    lines = []
+    # Log outputs
+    lines.append(f"Output value: {func_name}")
This keeps the existing "Output value:" prefix (so current tests continue to pass) while making the per‑call output section self‑describing.

596-607: Improve diagnostics for logging failures with logging.exception

The decorator intentionally guards pre‑ and post‑execution logging with broad except Exception blocks to avoid interfering with the wrapped function. That makes sense, but using logging.error(..., %s) drops the traceback, which can make debugging logging issues tricky.

You can keep the broad guard while emitting richer diagnostics via logging.exception:
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (pre-execution)]", func_name
+                )
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}")
+            except Exception:
+                _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)
This preserves crash‑safety while giving you a stack trace when something goes wrong inside the logging path.

Also applies to: 613-618

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c049bbf and 73c8eb2.

📒 Files selected for processing (3)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
tests/utils/test_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (522-627)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api_log (522-627)

decorator (581-622)

wrapper (583-620)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

🪛 Ruff (0.14.5)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

346-346: Do not catch blind exception: Exception

(BLE001)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

flashinfer/api_logging.py

142-142: Do not catch blind exception: Exception

(BLE001)

162-162: Do not catch blind exception: Exception

(BLE001)

180-180: Do not catch blind exception: Exception

(BLE001)

210-210: Do not catch blind exception: Exception

(BLE001)

314-314: Do not catch blind exception: Exception

(BLE001)

378-378: Use explicit conversion flag

Replace with conversion flag

(RUF010)

389-389: Use explicit conversion flag

Replace with conversion flag

(RUF010)

393-393: Use explicit conversion flag

Replace with conversion flag

(RUF010)

394-394: Do not catch blind exception: Exception

(BLE001)

437-437: Consider moving this statement to an else block

(TRY300)

438-438: Do not catch blind exception: Exception

(BLE001)

498-498: Unused function argument: func_name

(ARG001)

522-522: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

593-594: try-except-pass detected, consider logging the exception

(S110)

593-593: Do not catch blind exception: Exception

(BLE001)

606-606: Do not catch blind exception: Exception

(BLE001)

607-607: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

617-617: Do not catch blind exception: Exception

(BLE001)

618-618: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (2)

tests/utils/test_logging.py (1)

27-585: Comprehensive coverage of decorator behavior

This test suite does a good job exercising the decorator across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, tensor metadata, nested structures, and CUDA graph scenarios; I don’t see functional issues here.

benchmarks/bench_logging_overhead.py (1)

56-219: Benchmark structure and CUDA timing look solid

The undecorated vs decorated matmul comparison, CUDA synchronization, warmup phase, and summary statistics all look correct for measuring logging overhead.

Also applies to: 221-339

benchmarks/bench_logging_overhead.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer/gemm/gemm_base.py (1)
568-640: Fix SM110 support check and consider basic shape validation in tgv_gemm_sm100

Two minor issues here:

The docstring says “Requires SM100, SM103, or SM110 architecture” (Lines 593–595), but the runtime check only allows "100" and "103":
if not _match_sm_version(a.device, ["100", "103"]):
    raise ValueError("TGV GEMM requires SM100, SM103 architecture")
If SM110 is actually supported by the TGV kernel (as the name gen_tgv_gemm_sm10x_module and note about SM100f suggest), this should include "110"; otherwise the note in the docstring should be adjusted to avoid misleading users.

There’s no explicit validation of tensor ranks/shapes or bias shape. Given the comment “Computes: A @ B + bias” and the expectation (M, K) @ (K, N) + (N,), adding a fast check for a.ndim == 2, b.ndim == 2, bias.ndim == 1, a.shape[1] == b.shape[0], and bias.shape[0] == b.shape[1] would fail early with clear error messages instead of deferring to the underlying kernel.

Both changes would make this wrapper more robust without affecting the autotuning flow.

♻️ Duplicate comments (2)

flashinfer/api_logging.py (1)
28-45: Harden env parsing and handler setup to avoid import-time failures

Right now, two failure modes in this module can prevent flashinfer from importing at all:

Line 42: int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) raises ValueError/TypeError if the env var is non-numeric (e.g. "foo"), aborting import.

Lines 65–71: logging.FileHandler(_API_LOG_DEST, mode="a") will raise OSError on invalid/unwritable paths, also at import time.

For a debug-only feature, this is too fragile; misconfigured env vars should disable logging or fall back safely, not make the library unusable.

A minimal hardening would be:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
+try:
+    _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
+except (TypeError, ValueError):
+    # Fall back to disabled logging on invalid level
+    _API_LOG_LEVEL = 0
and in _setup_logger:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        # Fall back to stderr so the rest of the library remains usable
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This keeps existing semantics in the happy path but makes misconfiguration non-fatal.

Also applies to: 51-82
benchmarks/bench_logging_overhead.py (1)
30-36: Align benchmark’s LOG_DEST with the decorator’s actual destination

Here LOG_DEST defaults to /tmp/flashinfer_benchmark_log.txt when FLASHINFER_LOGDEST_DBG is unset, but flashinfer.api_logging still defaults its destination to "stdout". That means:

The decorator logs to stdout.

The benchmark later looks for a file at LOG_DEST (Lines 248–251, 292–301) and usually finds nothing, so the “LOG FILE INFO” section never prints.

To keep the benchmark’s view of the log destination consistent with the decorator, set the env var before importing flashinfer_log:
LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
LOG_DEST = os.environ.get(
    "FLASHINFER_LOGDEST_DBG", "/tmp/flashinfer_benchmark_log.txt"
)
+# Ensure the decorator sees the same destination default
+os.environ.setdefault("FLASHINFER_LOGDEST_DBG", LOG_DEST)

# Import the decorator
from flashinfer.api_logging import flashinfer_log
This preserves the current default path but guarantees both components use the same destination.

Also applies to: 248-251, 292-301

🧹 Nitpick comments (2)

flashinfer/cudnn/prefill.py (1)

6-6: Decorator integration and new backend flags look safe; consider documenting backend.

Importing and applying @flashinfer_log to cudnn_batch_prefill_with_kv_cache is semantically safe because the decorator is a no-op at log level 0 and preserves the function signature.

Adding is_cuda_graph_compatible and backend is backward‑compatible; they are defaulted and only change behavior when explicitly used (e.g., backend=="cubin" forces the fmha‑gen path).

Minor: the docstring mentions is_cuda_graph_compatible but not backend. It may be worth adding a short description for backend for consistency and to clarify the "cubin" override behavior.

Also applies to: 387-411

tests/utils/test_logging.py (1)

43-585: Logging tests are comprehensive; Ruff ARG001 hints are low-priority in this context

This suite does a good job validating flashinfer_log behavior (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, CUDA graphs, nested structures, etc.) and the fixture correctly resets env state and reloads the module between tests. The Ruff ARG001 warnings about unused parameters in test functions (e.g., mode, int_val, etc.) are expected for this style of tests; if you care about a clean lint run you can prefix those parameters with _ or reference them trivially, but functionally the current code is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73c8eb2 and 3d11a48.

📒 Files selected for processing (10)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
flashinfer/cudnn/decode.py (2 hunks)
flashinfer/cudnn/prefill.py (2 hunks)
flashinfer/decode.py (10 hunks)
flashinfer/fused_moe/core.py (7 hunks)
flashinfer/gemm/gemm_base.py (12 hunks)
flashinfer/mla.py (4 hunks)
flashinfer/prefill.py (11 hunks)
tests/utils/test_logging.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/mla.py

🧰 Additional context used

🧬 Code graph analysis (9)

flashinfer/cudnn/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_log (466-564)

decorator (519-560)

wrapper (521-558)

flashinfer/cudnn/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/fused_moe/core.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/gemm/gemm_base.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

🪛 Ruff (0.14.5)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

flashinfer/api_logging.py

108-108: Do not catch blind exception: Exception

(BLE001)

128-128: Do not catch blind exception: Exception

(BLE001)

146-146: Do not catch blind exception: Exception

(BLE001)

154-154: Do not catch blind exception: Exception

(BLE001)

258-258: Do not catch blind exception: Exception

(BLE001)

322-322: Use explicit conversion flag

Replace with conversion flag

(RUF010)

333-333: Use explicit conversion flag

Replace with conversion flag

(RUF010)

337-337: Use explicit conversion flag

Replace with conversion flag

(RUF010)

338-338: Do not catch blind exception: Exception

(BLE001)

381-381: Consider moving this statement to an else block

(TRY300)

382-382: Do not catch blind exception: Exception

(BLE001)

442-442: Unused function argument: func_name

(ARG001)

466-466: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

531-532: try-except-pass detected, consider logging the exception

(S110)

531-531: Do not catch blind exception: Exception

(BLE001)

544-544: Do not catch blind exception: Exception

(BLE001)

545-545: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

555-555: Do not catch blind exception: Exception

(BLE001)

556-556: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

329-329: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (5)

flashinfer/decode.py (1)

24-24: API logging decoration across decode entry points looks correct and non-intrusive.

Importing flashinfer_log and applying it to all user‑facing decode APIs (single, batch, wrappers, and TRT‑LLM/XQA helpers) preserves signatures and overload contracts while adding centralized logging.

The decorator’s no‑op behavior at log level 0 avoids extra overhead in production, and functools.wraps keeps introspection and partialmethod/aliasing (begin_forward, run_return_lse) working as before.

No behavioral changes to argument defaults, CUDA‑graph handling, or backend routing are introduced by these annotations.

Also applies to: 316-345, 393-410, 652-664, 816-840, 1170-1186, 2068-2089, 2342-2359, 2527-2544, 2689-2705

flashinfer/prefill.py (1)

25-25: Prefill API logging wiring is consistent and preserves existing behavior.

Importing and applying @flashinfer_log to single‑prefill helpers, both prefill wrappers (paged and ragged), and the TRT‑LLM prefill/context functions cleanly instruments all public prefill entry points without altering their logic.

Overloads remain untouched; only concrete implementations are wrapped, and begin_forward / run_return_lse aliases continue to work through the decorated methods.

No changes to backend selection, CUDA‑graph handling, or tensor shape/device contracts are introduced in these hunks.

Also applies to: 877-888, 962-985, 1331-1346, 1527-1563, 1984-2000, 2359-2372, 2503-2531, 2848-2859, 3205-3227, 3340-3362

flashinfer/cudnn/decode.py (1)

6-6: cudnn decode logging and CUDA‑graph flag propagation look correct.

Importing flashinfer_log and decorating cudnn_batch_decode_with_kv_cache integrates this entry point into the new logging system without touching its core logic.

The is_cuda_graph_compatible argument is defaulted and only matters in the fmha‑gen fallback path; CUDNN proper continues to behave as before.

Signature and docstring remain aligned, so existing callers are unaffected while gaining optional logging and an explicit CUDA‑graph compatibility hint.

Also applies to: 256-273

flashinfer/fused_moe/core.py (1)

23-23: API-level logging instrumentation for MoE wrappers looks correct

Importing flashinfer_log and decorating the public MoE entrypoints here cleanly extends logging without touching the underlying custom ops. Because flashinfer_log returns the original function when FLASHINFER_LOGLEVEL_DBG=0, these changes are behavior-preserving in the default/off configuration, and the wrappers’ signatures remain unchanged. No issues from the decorator stacking are apparent.

Also applies to: 688-719, 1862-1881, 1943-1965, 2017-2040, 2095-2127, 2225-2257

flashinfer/gemm/gemm_base.py (1)

914-1083: Decorator-based logging on GEMM and segment-GEMM entrypoints looks consistent

Importing flashinfer_log here and applying it to SegmentGEMMWrapper.run, tgv_gemm_sm100, and the various FP8/FP4 GEMM entrypoints (mm_fp8, mm_fp4, bmm_fp8, gemm_fp8_nt_groupwise, blockscaled/group variants, DeepGEMM group/batch wrappers) is consistent with the rest of the PR:

Signatures and return types are unchanged.

At FLASHINFER_LOGLEVEL_DBG=0 the decorator returns the original function, so hot paths (GEMM kernels) pay no additional cost when logging is disabled.

Where combined with @backend_requirement, the backend checks still wrap the logged function cleanly.

From a call-site perspective, this is a non-breaking way to surface API-level logs for GEMM operations.

Also applies to: 1573-1682, 1851-2002, 2101-2190, 2193-2356, 2502-2527, 2531-2690, 2694-2825, 2862-2989, 2993-3125

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d11a48 and a071556.

📒 Files selected for processing (2)

LOGGING.md (1 hunks)
README.md (1 hunks)

✅ Files skipped from review due to trivial changes (1)

README.md

🧰 Additional context used

🪛 GitHub Actions: pre-commit

LOGGING.md

[error] 1-1: pre-commit end-of-file-fixer hook failed. The hook modified LOGGING.md (added a newline at EOF).

🪛 LanguageTool

LOGGING.md

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (1)

LOGGING.md (1)

5-82: Well-structured documentation with clear guidance.

The logging documentation is clearly organized with practical examples, environment variable reference, and real-world use cases (CUDA graph compatibility, multi-GPU process ID substitution). The Quick Start and logging levels table make it easy for users to get started.

LOGGING.md

README.md

aleozlx · 2025-11-21T21:02:58Z

LOGGING.md

@@ -0,0 +1,83 @@
+# FlashInfer Logging


do we wanna organize this file in docs/ etc?

cc @yzh119

Or might be even better to convert this to an .rst so that it appears in the documentations

@yzh119 do you have suggestions on how I might convert and where I can place the file?

I think placing it under docs/logging.rst and indexing it in docs/index.rst should be great.

how I might convert

I have tried pandoc before but I believe any existing llm (gemini/claude/gpt) could do a better job.

@yzh119, makes sense. I converted in the latest commit and built the sphinx documentation locally

Here is how the index page looks: See red marks for where I placed it:

and here is how the actual logging page looks:

…ing level above 3 now automatically include cudnn and cublas API logging

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

benchmarks/bench_logging_overhead.py (1)
30-36: Ensure benchmark and decorator share the same log destination

When FLASHINFER_APIDEST is unset, LOG_DEST defaults to /tmp/flashinfer_benchmark_log.txt, but flashinfer.api_logging still defaults its destination to "stdout". In that case the decorator logs to stdout while the benchmark expects a file at LOG_DEST, so the “LOG FILE INFO” block is skipped because the file is never created.

You can keep the current default path and guarantee consistency by seeding the env var before importing the decorator:
-# Get logging level BEFORE importing flashinfer
-LOGGING_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0"))
-LOG_DEST = os.environ.get("FLASHINFER_APIDEST", "/tmp/flashinfer_benchmark_log.txt")
-
-# Import the decorator
-from flashinfer.api_logging import flashinfer_api
+# Get logging level BEFORE importing flashinfer
+LOGGING_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0"))
+LOG_DEST = os.environ.get("FLASHINFER_APIDEST", "/tmp/flashinfer_benchmark_log.txt")
+
+# Ensure the decorator sees the same destination
+os.environ.setdefault("FLASHINFER_APIDEST", LOG_DEST)
+
+# Import the decorator
+from flashinfer.api_logging import flashinfer_api
This keeps benchmark reporting aligned with where logs actually go.

Also applies to: 248-251, 293-301
flashinfer/api_logging.py (2)
41-44: Guard FLASHINFER_APILEVEL parsing to avoid import-time crashes

_API_LOG_LEVEL is computed with a bare int(os.environ.get("FLASHINFER_APILEVEL", "0")). If a user sets FLASHINFER_APILEVEL to a non-integer (e.g. "debug"), importing flashinfer.api_logging will raise ValueError and prevent FlashInfer from loading.

For a debugging facility this should degrade gracefully instead of breaking imports. You can harden this with a small try/except:
-# Read environment variables once at module load time
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0"))
+_raw_level = os.environ.get("FLASHINFER_APILEVEL", "0")
+try:
+    _API_LOG_LEVEL = int(_raw_level)
+except (TypeError, ValueError):
+    # Fall back to no logging on invalid config
+    _API_LOG_LEVEL = 0
+    # Optional: print a one-shot warning to stderr or log once logger is ready.
This keeps the default semantics while making misconfiguration non-fatal.

63-76: Make FileHandler creation resilient to bad FLASHINFER_APIDEST paths

_setup_logger() creates a logging.FileHandler for non-stdout/stderr destinations:
if _API_LOG_DEST == "stdout":
    handler = logging.StreamHandler(sys.stdout)
elif _API_LOG_DEST == "stderr":
    handler = logging.StreamHandler(sys.stderr)
else:
    handler = logging.FileHandler(_API_LOG_DEST, mode="a")
If FLASHINFER_APIDEST points to an invalid or unwritable path, logging.FileHandler will raise OSError at import time (since _setup_logger() runs at module load), breaking all imports of flashinfer.api_logging.

To keep the library usable under bad logging config, wrap handler creation and fall back to stderr:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        # Fall back to stderr so the rest of the library remains usable
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_APIDEST=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This preserves intended behavior while making logging configuration failure non-fatal.

🧹 Nitpick comments (3)

flashinfer/cudnn/prefill.py (1)

6-6: Logging decoration and API extension look correct; backend arg is undocumented

Applying @flashinfer_api to cudnn_batch_prefill_with_kv_cache and importing it here is consistent with the rest of the PR and should not change behavior when logging is disabled (APILEVEL=0). The new is_cuda_graph_compatible and backend parameters are threaded correctly into the cudnn-fallback path (get_cudnn_fmha_gen_module().prefill) and are no-ops for the main cuDNN-graph path.

One small gap: the docstring documents is_cuda_graph_compatible but not backend, even though backend="cubin" selects the non-graph cudnn-fmha path. Consider adding a short description of valid backend values and their effect for clarity.

Also applies to: 387-411

flashinfer/gemm/gemm_base.py (2)

543-615: tgv_gemm_sm100 implementation is sound; consider tighter validation and doc/arch alignment

The tgv_gemm_sm100 wrapper correctly:

Restricts execution to SM100/103 via _match_sm_version.

Enforces a to be bf16/fp16 and dtype‑matches b.

Uses AutoTuner with a dedicated TGVGemmRunner that swaps (A, B) into the layout expected by the underlying kernel and returns an (M, N) output.

Two optional improvements:

The docstring says “Requires SM100, SM103, or SM110 architecture” but the check only allows "100" and "103". Either extend _match_sm_version here to include "110" (if supported) or update the note to avoid mentioning SM110.

Add explicit checks on shapes and bias/device for clearer Python‑side errors, e.g.:

a.ndim == 2, b.ndim == 2

a.shape[1] == b.shape[0]

bias.shape == (b.shape[1],) and bias.device == a.device

These would fail fast with actionable messages instead of relying on lower‑level kernel assertions.

1558-1665: mm_fp8 wrapper is minimal and correct; behavior constraints are clear

The mm_fp8 helper is a thin, well‑scoped wrapper over trtllm_low_latency_gemm:

It restricts backend to "trtllm_low_latency" and enforces this up front.

It only allows torch.bfloat16 outputs (either via out_dtype when allocating, or via out.dtype when an output tensor is provided), matching supported_out_dtypes.

It validates the output shape and device when out is supplied and ensures consistency with out_dtype.

If you plan to support additional backends or output dtypes in the future, the current guards will need relaxing, but as written the function is coherent and safe. You may optionally add light checks on a/b dtypes and K‑dimension compatibility to give earlier, Python‑level error messages; otherwise, leaving these to the underlying kernel is acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a071556 and 41ad558.

📒 Files selected for processing (12)

LOGGING.md (1 hunks)
README.md (1 hunks)
benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
flashinfer/cudnn/decode.py (2 hunks)
flashinfer/cudnn/prefill.py (2 hunks)
flashinfer/decode.py (10 hunks)
flashinfer/fused_moe/core.py (7 hunks)
flashinfer/gemm/gemm_base.py (12 hunks)
flashinfer/mla.py (4 hunks)
flashinfer/prefill.py (11 hunks)
tests/utils/test_logging.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

LOGGING.md
flashinfer/mla.py
flashinfer/fused_moe/core.py

🧰 Additional context used

🧬 Code graph analysis (8)

flashinfer/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

flashinfer/cudnn/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

flashinfer/cudnn/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

flashinfer/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api (464-562)

decorator (517-558)

wrapper (519-556)

flashinfer/gemm/gemm_base.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

🪛 Ruff (0.14.5)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

329-329: Do not catch blind exception: Exception

(BLE001)

flashinfer/api_logging.py

106-106: Do not catch blind exception: Exception

(BLE001)

126-126: Do not catch blind exception: Exception

(BLE001)

144-144: Do not catch blind exception: Exception

(BLE001)

152-152: Do not catch blind exception: Exception

(BLE001)

256-256: Do not catch blind exception: Exception

(BLE001)

320-320: Use explicit conversion flag

Replace with conversion flag

(RUF010)

331-331: Use explicit conversion flag

Replace with conversion flag

(RUF010)

335-335: Use explicit conversion flag

Replace with conversion flag

(RUF010)

336-336: Do not catch blind exception: Exception

(BLE001)

379-379: Consider moving this statement to an else block

(TRY300)

380-380: Do not catch blind exception: Exception

(BLE001)

440-440: Unused function argument: func_name

(ARG001)

464-464: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

529-530: try-except-pass detected, consider logging the exception

(S110)

529-529: Do not catch blind exception: Exception

(BLE001)

542-542: Do not catch blind exception: Exception

(BLE001)

543-543: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

553-553: Do not catch blind exception: Exception

(BLE001)

554-554: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (18)

flashinfer/cudnn/decode.py (1)

6-7: Decorator application on cudnn_batch_decode_with_kv_cache looks good

Importing flashinfer_api and decorating only the public entry point keeps the cuDNN decode path consistent with the rest of the API while relying on the zero-overhead behavior at level 0. No signature or control-flow changes here, so this instrumentation is safe.

Also applies to: 256-350

tests/utils/test_logging.py (1)

27-585: Comprehensive, well-isolated test coverage for the logging decorator

The test suite does a good job covering all intended behaviors (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, nested structures, crash safety, and CUDA/CUDA-graph semantics), with per-test env reset and module reload to avoid cross-test contamination. This provides strong confidence in the decorator’s behavior.

flashinfer/decode.py (3)

24-24: Single‑decode API logging wiring is consistent

Importing flashinfer_api and decorating single_decode_with_kv_cache_with_jit_module and single_decode_with_kv_cache introduces pre/post call logging at the user‑visible decode entrypoints without touching their logic or overload contracts. The decorator placement and signatures look correct.

Also applies to: 316-345, 393-584

652-783: BatchDecodeWithPagedKVCacheWrapper instrumentation is well‑placed

Decorating __init__, plan, and run on BatchDecodeWithPagedKVCacheWrapper cleanly exposes wrapper lifecycle and execution to the logging framework. Because flashinfer_api is class‑aware, log records will carry BatchDecodeWithPagedKVCacheWrapper.<method> names, and aliases like begin_forward = plan and run_return_lse = functools.partialmethod(run, ...) correctly reuse the already‑decorated methods. No behavioral regressions are evident.

Also applies to: 816-1112, 1170-1390

2068-2337: Decode backend entrypoints uniformly covered by API logging

Adding @flashinfer_api to trtllm_batch_decode_with_kv_cache, xqa_batch_decode_with_kv_cache, and the MLA variants (trtllm_batch_decode_with_kv_cache_mla, xqa_batch_decode_with_kv_cache_mla) gives consistent logging across all decode backends and specializations. The decorators sit directly on the public wrappers that already enforce shape/dtype/device contracts, so logging will see the normalized arguments and final outputs without altering existing backend selection or error handling.

Also applies to: 2343-2482, 2528-2687, 2690-2799

flashinfer/gemm/gemm_base.py (2)

25-25: Importing flashinfer_api here is appropriate

Bringing flashinfer_api into gemm_base aligns this module with the new API‑logging infrastructure and enables decoration of GEMM entrypoints without touching their core logic.

2032-2192: Decorator ordering and logging on FP4/FP8 GEMM paths look correct

For the FP4/FP8 GEMM helpers:

mm_fp4 and bmm_fp8 use @backend_requirement(...) outside @flashinfer_api. This ordering ensures:

backend_requirement continues to attach attributes like mm_fp4.suitable_auto_backends and bmm_fp8.suitable_auto_backends to the exported function objects.

The logging wrapper sees the fully normalized call arguments after backend requirements are satisfied, while still logging inputs before the heavy kernels run.

The various group/DeepGEMM helpers (gemm_fp8_nt_groupwise, gemm_fp8_nt_blockscaled, group_gemm_fp8_nt_groupwise, group_gemm_mxfp8_mxfp4_nt_groupwise, group_deepgemm_fp8_nt_groupwise, batch_deepgemm_fp8_nt_groupwise) are decorated directly, which:

Preserves their existing shape/dtype/device validations and backend selection logic.

Provides uniform pre/post‑call logging across SM100/120/121 CUTLASS and DeepGEMM paths.

Given _API_LOG_LEVEL == 0 short‑circuits the decorator to return the original functions, these changes keep the hot GEMM paths zero‑overhead in the default configuration while enabling rich diagnostics when explicitly enabled.

Also applies to: 2289-2379, 2382-2546, 2634-2660, 2826-2957, 2994-3122, 3125-3257

flashinfer/prefill.py (11)

25-25: LGTM! Clean import of the API logging decorator.

The import is correctly placed and necessary for instrumenting the public APIs throughout this file.

877-877: LGTM! Appropriate instrumentation of JIT module entry point.

Decorating this function enables tracking of JIT-based prefill operations, which is valuable for debugging and performance analysis.

962-962: LGTM! Key public API correctly instrumented.

The single_prefill_with_kv_cache function is a primary user-facing API, making it an ideal candidate for API logging. The decorator will help users debug issues and understand performance characteristics.

1331-1331: LGTM! Wrapper initialization tracking is valuable.

Instrumenting __init__ enables tracking of configuration parameters (workspace buffer size, layout, CUDA graph settings, backend choice), which is useful for diagnosing setup issues.

1527-1527: LGTM! Planning phase instrumentation is appropriate.

The plan method sets up critical auxiliary data structures. Logging here captures the problem specification (batch size, head dimensions, mask configuration, etc.), which is essential for understanding execution behavior.

1984-1984: LGTM! Execution instrumentation completes the lifecycle coverage.

Decorating run captures the actual attention computation with tensors and runtime parameters, providing the final piece for end-to-end debugging of paged KV cache workflows.

2359-2359: LGTM! Consistent instrumentation pattern.

Applying the decorator to BatchPrefillWithRaggedKVCacheWrapper.__init__ maintains consistency with the paged wrapper and provides parallel observability for ragged KV cache workflows.

2503-2503: LGTM! Planning phase tracking for ragged workflows.

The decorator here enables debugging of ragged KV cache planning, capturing the indptr arrays and problem dimensions that drive kernel selection.

2848-2848: LGTM! Execution tracking completes ragged workflow coverage.

Instrumenting the run method provides visibility into ragged attention computation, completing end-to-end observability for this code path.

3205-3205: LGTM! TRT-LLM backend instrumentation is valuable.

Decorating trtllm_ragged_attention_deepseek ensures observability for DeepSeek-specific workloads using the TRT-LLM backend, enabling debugging of this specialized code path.

3340-3340: LGTM! Comprehensive backend coverage.

Instrumenting trtllm_batch_context_with_kv_cache completes coverage of TRT-LLM backend entry points, ensuring that all backend options benefit from the logging infrastructure.

README.md

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

flashinfer/api_logging.py (4)
358-382: Narrow _get_default_params exception handling to the errors inspect.signature actually raises

_get_default_params currently does:
    try:
        sig = inspect.signature(func)
        ...
        return default_params
    except Exception:
        # If we can't inspect the signature, return empty dict
        return {}
Catching all Exception here is broader than needed and can hide unexpected bugs. inspect.signature is documented to raise TypeError or ValueError when it can’t produce a signature.

You can make this safer without changing behavior:
-    try:
-        sig = inspect.signature(func)
-        default_params = {}
-        ...
-        return default_params
-    except Exception:
-        # If we can't inspect the signature, return empty dict
-        return {}
+    try:
+        sig = inspect.signature(func)
+        default_params = {}
+        ...
+        return default_params
+    except (TypeError, ValueError):
+        # If we can't inspect the signature, return empty dict
+        return {}
This keeps the “best-effort” semantics while avoiding masking unrelated exceptions.

523-529: Simplify redundant BatchMLAPagedAttentionWrapper check in class-name handling

The class-name logic currently special-cases BatchMLAPagedAttentionWrapper even though its name already contains "Wrapper":
class_name = args[0].__class__.__name__
if "Wrapper" in class_name or class_name in [
    "BatchMLAPagedAttentionWrapper"
]:
    func_name = f"{class_name}.{func_name}"
The explicit "BatchMLAPagedAttentionWrapper" entry is redundant. You can simplify to:
-                    class_name = args[0].__class__.__name__
-                    if "Wrapper" in class_name or class_name in [
-                        "BatchMLAPagedAttentionWrapper"
-                    ]:
-                        func_name = f"{class_name}.{func_name}"
+                    class_name = args[0].__class__.__name__
+                    if "Wrapper" in class_name:
+                        func_name = f"{class_name}.{func_name}"
This keeps behavior identical while making the intent clearer.

41-44: Make FLASHINFER_LOGLEVEL parsing robust to invalid values

_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) will raise ValueError if a user sets FLASHINFER_LOGLEVEL to a non-integer (e.g., "debug"), preventing flashinfer.api_logging (and potentially FlashInfer) from importing.

A small hardening tweak keeps imports safe and defaults to level 0 on bad input:
-# Read environment variables once at module load time
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))
-_API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout"))
+# Read environment variables once at module load time
+_raw_level = os.environ.get("FLASHINFER_LOGLEVEL", "0")
+try:
+    _API_LOG_LEVEL = int(_raw_level)
+except (TypeError, ValueError):
+    _API_LOG_LEVEL = 0
+
+_API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout"))
This matches the “debugging-only” nature of the feature while avoiding import-time failures on misconfiguration.

63-76: Prevent invalid FLASHINFER_LOGDEST from breaking imports by handling FileHandler errors

Because _setup_logger() runs at import time, an unwritable or malformed FLASHINFER_LOGDEST will cause logging.FileHandler(_API_LOG_DEST, ...) to raise OSError and break FlashInfer imports. For a debugging facility, that’s too fragile.

Wrap handler creation and fall back to stderr on failure:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break imports
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        # Fall back to stderr so the rest of the library remains usable
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST %r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This preserves behavior when the path is valid but avoids taking down the whole module when it isn’t.
benchmarks/bench_logging_overhead.py (1)
30-36: Align LOG_DEST with FLASHINFER_LOGDEST so the benchmark sees the actual log file

Right now, if FLASHINFER_LOGDEST is unset, the decorator defaults to "stdout" while the benchmark assumes logs go to LOG_DEST (/tmp/flashinfer_benchmark_log.txt). In that case no file is created and the “LOG FILE INFO” block is silently skipped.

Set the environment variable before importing the decorator so both use the same destination:
-# Get logging level BEFORE importing flashinfer
-LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))
-LOG_DEST = os.environ.get("FLASHINFER_LOGDEST", "/tmp/flashinfer_benchmark_log.txt")
-
-# Import the decorator
-from flashinfer.api_logging import flashinfer_api
+# Get logging level BEFORE importing flashinfer
+LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))
+LOG_DEST = os.environ.get("FLASHINFER_LOGDEST", "/tmp/flashinfer_benchmark_log.txt")
+
+# Ensure the decorator sees the same destination (don’t override if user set it)
+os.environ.setdefault("FLASHINFER_LOGDEST", LOG_DEST)
+
+# Import the decorator
+from flashinfer.api_logging import flashinfer_api
This keeps the current default path while guaranteeing the decorator writes to the file that main() inspects.

Also applies to: 248-251, 293-301

🧹 Nitpick comments (5)

benchmarks/bench_logging_overhead.py (2)

32-32: Consider avoiding a hard-coded /tmp log path in shared environments

Using a fixed path like "/tmp/flashinfer_benchmark_log.txt" is fine for local, single-user benchmarking, but in shared or security-sensitive environments it can trip linters (S108) and create race/symlink issues.

If you expect this benchmark to be run more broadly, consider:

deriving the default from tempfile.gettempdir() and the PID, or

requiring callers to set FLASHINFER_LOGDEST explicitly and only using /tmp/... in docs/examples.

No change is strictly required if this is dev-only tooling.

Also applies to: 248-251, 293-301

324-333: Catching bare Exception in main() is acceptable here but could be narrowed

For a benchmark CLI, catching KeyboardInterrupt explicitly and then using a bare Exception to print a traceback is workable. If you want stricter hygiene (and to appease BLE001), consider:

keeping the explicit KeyboardInterrupt handler, and

either letting other exceptions propagate (so pytest / callers see failures cleanly) or narrowing the catch to expected runtime errors.

Not critical for a standalone benchmark, so this can be deferred.

tests/utils/test_logging.py (1)

43-81: Test harness and coverage for flashinfer_api look strong; only minor lint nits remain

The combination of:

the autouse setup_and_teardown fixture (resetting env + module cache), and

the setup_logging helper (per-test level/dest + reload)

gives you clean, isolated tests that accurately reflect how the decorator reads environment variables at import time. The behavioral checks across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash-safety, tensor metadata, nested structures, and CUDA graph capture all align well with flashinfer/api_logging.py.

The only minor follow-ups you may want, mainly to keep Ruff quiet:

For tests that intentionally don’t use parameters (e.g., mode in test_default_parameters, the many *_val args in test_different_data_types), consider prefixing them with _ or adding # noqa: ARG001 on the function line.

For the RuntimeError("Simulated crash") in test_crash_safety_inputs_logged_before_execution, you can keep it as-is in tests; TRY003 is purely stylistic here.

Functionally, this suite looks good to go.

Also applies to: 83-187, 218-285, 346-383
flashinfer/api_logging.py (2)
165-181: Update _format_value docstring to reflect actual levels (1, 3, 5)

The _format_value docstring says:

level : int — The logging level (1, 2, or 3)

But the implementation and public API use levels 1, 3, and 5, with 5 enabling statistics. To avoid confusion for maintainers, consider tightening the wording, e.g.:
-    level : int
-        The logging level (1, 2, or 3)
+    level : int
+        The API logging level (1, 3, or 5); higher levels include more detail.
Purely a documentation fix; runtime behavior is already consistent.

440-461: Either use func_name in _log_function_outputs or remove it to avoid confusion

_log_function_outputs accepts func_name but doesn’t currently include it in the log:
def _log_function_outputs(func_name: str, result: Any, level: int) -> None:
    ...
    lines.append("Output value:")
    lines.append(_format_value(result, level, indent=1))
To make logs easier to scan and satisfy linters about the unused argument, you could:

include the function name in the header, e.g.:
-    lines = []
-    # Log outputs
-    lines.append("Output value:")
+    lines = []
+    # Log outputs
+    lines.append(f"Output value for {func_name}:")
or

drop func_name from the signature and call sites if you prefer the current output format.

Either direction is fine; the goal is just to keep the API surface and log format consistent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41ad558 and 4a80fc4.

📒 Files selected for processing (3)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
tests/utils/test_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api (464-562)

decorator (517-558)

wrapper (519-556)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api (464-562)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

🪛 Ruff (0.14.5)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

329-329: Do not catch blind exception: Exception

(BLE001)

flashinfer/api_logging.py

106-106: Do not catch blind exception: Exception

(BLE001)

126-126: Do not catch blind exception: Exception

(BLE001)

144-144: Do not catch blind exception: Exception

(BLE001)

152-152: Do not catch blind exception: Exception

(BLE001)

256-256: Do not catch blind exception: Exception

(BLE001)

320-320: Use explicit conversion flag

Replace with conversion flag

(RUF010)

331-331: Use explicit conversion flag

Replace with conversion flag

(RUF010)

335-335: Use explicit conversion flag

Replace with conversion flag

(RUF010)

336-336: Do not catch blind exception: Exception

(BLE001)

379-379: Consider moving this statement to an else block

(TRY300)

380-380: Do not catch blind exception: Exception

(BLE001)

440-440: Unused function argument: func_name

(ARG001)

464-464: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

529-530: try-except-pass detected, consider logging the exception

(S110)

529-529: Do not catch blind exception: Exception

(BLE001)

542-542: Do not catch blind exception: Exception

(BLE001)

543-543: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

553-553: Do not catch blind exception: Exception

(BLE001)

554-554: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (2)

benchmarks/bench_logging_overhead.py (2)

152-201: Benchmark timing logic looks solid for CUDA workloads

The warmup and benchmark_function implementations correctly:

run a warmup phase,

synchronize the CUDA device before and after each timed call,

and use time.perf_counter() with per-iteration timing and summary stats.

This is an appropriate pattern for measuring decorator overhead on GPU kernels.

229-232: The review comment is based on incorrect assumptions about the PyTorch API.

The official PyTorch documentation explicitly shows that torch.cuda.get_device_name() accepts torch.device or int or str as parameters, and you can pass a device string like "cuda:0". The original code at lines 229-232, which passes device = "cuda:0" to torch.cuda.get_device_name(device), is valid and follows the documented API.

The claim that this would raise a TypeError is incorrect. No changes are needed.

Likely an incorrect or invalid review comment.

bkryu · 2025-11-21T22:26:27Z

@xslingcn @Ubospica @YiyanZhai @zanderjiang this PR should greatly reducing the overhead of extracting workloads for flashinfer-bench project.

Hi @xslingcn @Ubospica @YiyanZhai @zanderjiang, it would be useful if there are any suggestions on the logging behavior that can benefit flashinfer-bench. I am leaving a logging level for dumping tensors for future work, but if there are any please feel free to provide any input.

aleozlx · 2025-11-21T22:31:14Z

flashinfer/api_logging.py

+        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+
+    # Use a simple formatter (we'll add timestamps manually to key lines)
+    formatter = logging.Formatter("%(message)s")


should we expose the formatting as an env var to allow pid time stamps etc overridable by the user?

aleozlx · 2025-11-21T22:32:49Z

flashinfer/api_logging.py

+    This is useful for multi-process/multi-GPU environments where each process
+    needs its own log file.
+    """
+    if "%i" in path:


i think the standard library has

%(process)d
https://docs.python.org/3/library/logging.html#logrecord-attributes

we don't have to reinvent new things contrary to established standards

oh this is about file name, i was probably mistaken...

Yes I think the %(process)d only working in the logging module at the moment

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

flashinfer/api_logging.py (4)

340-382: Narrow exception handling in _get_default_params to documented failures

Catching bare Exception here hides unexpected bugs; inspect.signature is documented to raise ValueError or TypeError on failure. Restricting the except makes this safer while still returning {} when signature inspection isn’t supported.

-    try:
-        sig = inspect.signature(func)
-        default_params = {}
+    try:
+        sig = inspect.signature(func)
+        default_params: dict[str, Any] = {}
@@
-        return default_params
-    except Exception:
-        # If we can't inspect the signature, return empty dict
-        return {}
+        return default_params
+    except (TypeError, ValueError):
+        # If we can't inspect the signature, return empty dict
+        return {}

525-531: Simplify the class-name condition for wrappers

"BatchMLAPagedAttentionWrapper" already contains "Wrapper", so the explicit membership check is redundant. Dropping the special case slightly simplifies the condition without changing behavior.

-                    class_name = args[0].__class__.__name__
-                    if "Wrapper" in class_name or class_name in [
-                        "BatchMLAPagedAttentionWrapper"
-                    ]:
+                    class_name = args[0].__class__.__name__
+                    if "Wrapper" in class_name:
                         func_name = f"{class_name}.{func_name}"

41-44: Handle invalid FLASHINFER_LOGLEVEL values to avoid import-time crashes

int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) will raise ValueError if the env var is a non-integer string, breaking module import for a pure debugging feature. Parse defensively and fall back to level 0 instead.

-# Read environment variables once at module load time
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))
+_API_LOG_LEVEL_RAW = os.environ.get("FLASHINFER_LOGLEVEL", "0")
+try:
+    _API_LOG_LEVEL = int(_API_LOG_LEVEL_RAW)
+except (TypeError, ValueError):
+    # Fall back to disabled logging on invalid value
+    _API_LOG_LEVEL = 0
+
 _API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout"))

63-76: Prevent invalid FLASHINFER_LOGDEST from breaking imports

logging.FileHandler(_API_LOG_DEST, mode="a") will raise OSError on bad/unwritable paths at import time, preventing FlashInfer from loading when logging is enabled. This is too fragile for a debugging-only facility; gracefully degrade to stderr instead.

-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        # Fall back to stderr so the rest of the library remains usable
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )

🧹 Nitpick comments (3)

flashinfer/api_logging.py (3)

280-323: Consider bounding list/tuple/dict logging to avoid pathological log volume

For large or deeply nested containers, recursively logging every element can explode log size and latency at higher levels (3/5). Consider truncating after some N elements with a summary to keep logs usable while preserving signal.

-    # Handle lists
-    if isinstance(value, list):
+    # Handle lists
+    if isinstance(value, list):
         if len(value) == 0:
             return f"{indent_str}[]"
         if level == 1:
             return f"{indent_str}[list with {len(value)} items]"
 
-        lines = [f"{indent_str}["]
-        for i, item in enumerate(value):
-            lines.append(
-                f"{indent_str}  [{i}]: {_format_value(item, level, indent + 1)}"
-            )
+        max_items = 16
+        lines = [f"{indent_str}["]
+        for i, item in enumerate(value[: max_items]):
+            lines.append(
+                f"{indent_str}  [{i}]: {_format_value(item, level, indent + 1)}"
+            )
+        if len(value) > max_items:
+            lines.append(
+                f"{indent_str}  ... ({len(value) - max_items} more items truncated)"
+            )
@@
-    # Handle tuples
-    if isinstance(value, tuple):
+    # Handle tuples
+    if isinstance(value, tuple):
@@
-        lines = [f"{indent_str}("]
-        for i, item in enumerate(value):
+        max_items = 16
+        lines = [f"{indent_str}("]
+        for i, item in enumerate(value[: max_items]):
@@
-        lines.append(f"{indent_str})")
+        if len(value) > max_items:
+            lines.append(
+                f"{indent_str}  ... ({len(value) - max_items} more items truncated)"
+            )
+        lines.append(f"{indent_str})")
@@
-    # Handle dictionaries
-    if isinstance(value, dict):
+    # Handle dictionaries
+    if isinstance(value, dict):
@@
-        lines = [f"{indent_str}{{"]
-        for key, val in value.items():
+        max_items = 16
+        lines = [f"{indent_str}{{"]
+        for i, (key, val) in enumerate(list(value.items())[: max_items]):
@@
-        lines.append(f"{indent_str}}}")
+        if len(value) > max_items:
+            lines.append(
+                f"{indent_str}  ... ({len(value) - max_items} more keys truncated)"
+            )
+        lines.append(f"{indent_str}}}")

440-462: Include function name in output logs and use func_name

func_name is currently unused, and the output block doesn’t identify which API produced the value, which can be ambiguous with interleaved logs. Reusing the timestamped header format here improves attribution and removes the unused-arg warning.

-    lines = []
-    # Log outputs
-    lines.append("Output value:")
-    lines.append(_format_value(result, level, indent=1))
-
-    lines.append("=" * 80)
-    lines.append("")  # Empty line for readability
+    lines = []
+    lines.append(f"{_get_timestamp()} FlashInfer API Output: {func_name}")
+    lines.append("-" * 80)
+    lines.append("Output value:")
+    lines.append(_format_value(result, level, indent=1))
+    lines.append("=" * 80)
+    lines.append("")  # Empty line for readability

520-557: Tighten decorator exception paths and avoid silent except ...: pass

Two small robustness/observability nits:

The try/except Exception: pass around class_name silently hides unexpected errors.
Pre/post logging paths use logging.error instead of logging.exception, losing the stack trace when logging itself fails.

Switching to logging.exception and logging a debug on class-name failures improves diagnosability without impacting crash-safety.

-            if args and hasattr(args[0], "__class__"):
-                try:
-                    class_name = args[0].__class__.__name__
-                    if "Wrapper" in class_name or class_name in [
-                        "BatchMLAPagedAttentionWrapper"
-                    ]:
-                        func_name = f"{class_name}.{func_name}"
-                except Exception:
-                    pass
+            if args and hasattr(args[0], "__class__"):
+                try:
+                    class_name = args[0].__class__.__name__
+                    if "Wrapper" in class_name or class_name in [
+                        "BatchMLAPagedAttentionWrapper"
+                    ]:
+                        func_name = f"{class_name}.{func_name}"
+                except Exception:
+                    _logger.debug(
+                        "Failed to derive class name for %r in flashinfer_api", f
+                    )
@@
-            try:
-                if _API_LOG_LEVEL == 1:
+            try:
+                if _API_LOG_LEVEL == 1:
@@
-                    _log_function_inputs(f, func_name, args, kwargs, _API_LOG_LEVEL)
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}")
+                    _log_function_inputs(f, func_name, args, kwargs, _API_LOG_LEVEL)
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (pre-execution)]", func_name
+                )
@@
-            try:
-                if _API_LOG_LEVEL >= 3:
+            try:
+                if _API_LOG_LEVEL >= 3:
@@
-                    _log_function_outputs(func_name, result, _API_LOG_LEVEL)
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}")
+                    _log_function_outputs(func_name, result, _API_LOG_LEVEL)
+            except Exception:
+                _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a80fc4 and a75c359.

📒 Files selected for processing (1)

flashinfer/api_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer/api_logging.py (2)

include/flashinfer/logging.h (1)

logging (31-41)

csrc/nv_internal/include/tensorrt_llm/common/logger.h (1)

setLevel (75-78)

🪛 Ruff (0.14.5)

flashinfer/api_logging.py

106-106: Do not catch blind exception: Exception

(BLE001)

126-126: Do not catch blind exception: Exception

(BLE001)

144-144: Do not catch blind exception: Exception

(BLE001)

152-152: Do not catch blind exception: Exception

(BLE001)

256-256: Do not catch blind exception: Exception

(BLE001)

320-320: Use explicit conversion flag

Replace with conversion flag

(RUF010)

331-331: Use explicit conversion flag

Replace with conversion flag

(RUF010)

335-335: Use explicit conversion flag

Replace with conversion flag

(RUF010)

336-336: Do not catch blind exception: Exception

(BLE001)

379-379: Consider moving this statement to an else block

(TRY300)

380-380: Do not catch blind exception: Exception

(BLE001)

440-440: Unused function argument: func_name

(ARG001)

464-464: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

532-533: try-except-pass detected, consider logging the exception

(S110)

532-532: Do not catch blind exception: Exception

(BLE001)

545-545: Do not catch blind exception: Exception

(BLE001)

546-546: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

556-556: Do not catch blind exception: Exception

(BLE001)

557-557: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

aleozlx · 2025-11-21T22:37:33Z

(non-blocking) we can keep @flashinfer_log for cases where we do wanna log helper functions
cc @bkryu

aleozlx

giving my approval as a vote up

yzh119

LGTM overall, thanks @bkryu for the great work!

bkryu · 2025-11-22T01:19:37Z

/bot run

flashinfer-bot · 2025-11-22T01:20:50Z

GitLab MR !156 has been created, and the CI pipeline #38971561 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-11-22T07:11:57Z

[FAILED] Pipeline #38971561: 15/18 passed

bkryu requested review from Anerudhan, aleozlx, cyx-6, djmmoss, jiahanc, nvmbreughe, wenscarl and yzh119 as code owners November 18, 2025 21:52

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

bkryu self-assigned this Nov 19, 2025

yzh119 reviewed Nov 19, 2025

View reviewed changes

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

benchmarks/bench_logging_overhead.py Show resolved Hide resolved

bkryu force-pushed the api_logging branch from 73c8eb2 to 3d11a48 Compare November 21, 2025 01:20

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

LOGGING.md Outdated Show resolved Hide resolved

LOGGING.md Outdated Show resolved Hide resolved

bkryu changed the title ~~[wip] feat: Enable API Logging for Better Debugging POC~~ feat: Enable API Logging for Better Debugging POC Nov 21, 2025

aleozlx reviewed Nov 21, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

aleozlx reviewed Nov 21, 2025

View reviewed changes

bkryu added 5 commits November 21, 2025 21:43

First commit

18de631

Adding benchmark. Applying pre-commit

ae44417

Log System Info

753e600

Rename logging env vars. Set default dest to stdout

6d0406b

Allow %i substitution for process ID for multi-GPU environments. Logg…

e984e1b

…ing level above 3 now automatically include cudnn and cublas API logging

bkryu added 5 commits November 21, 2025 21:43

Add time stamps and space out levels

62b6436

Cleanup and streamline

aed27cf

Adding documentation updates

5ec246b

Fix typo and apply pre-commit

8459eb1

Rename decorator and environment flags

41ad558

bkryu force-pushed the api_logging branch from b66249e to 41ad558 Compare November 21, 2025 21:59

Fix typo

4a80fc4

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

README.md Show resolved Hide resolved

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

Add disclaimer to decorator docstring

a75c359

aleozlx reviewed Nov 21, 2025

View reviewed changes

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

aleozlx approved these changes Nov 21, 2025

View reviewed changes

yzh119 approved these changes Nov 22, 2025

View reviewed changes

Move logging.md to documentation

3bb0b73

yzh119 merged commit 5acb57b into flashinfer-ai:main Nov 22, 2025
4 checks passed



		# Read environment variables once at module load time
		_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

feat: Enable API Logging for Better Debugging POC #2108

feat: Enable API Logging for Better Debugging POC #2108

Uh oh!

Conversation

bkryu commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Release Notes

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

bkryu commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented Nov 19, 2025

Uh oh!

Ubospica commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aleozlx Nov 21, 2025

Choose a reason for hiding this comment

bkryu commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

bkryu commented Nov 19, 2025 •

edited

Loading

Ubospica commented Nov 19, 2025 •

edited

Loading

aleozlx Nov 21, 2025 •

edited

Loading