-
Notifications
You must be signed in to change notification settings - Fork 578
feat: Enable API Logging for Better Debugging POC #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and flexible API logging system designed to significantly enhance debugging capabilities within FlashInfer. The system allows developers to easily instrument functions with a decorator to log calls, arguments, and return values at various levels of detail, from simple function names to comprehensive tensor metadata and statistics. A key design principle is crash-safety, ensuring that input arguments are logged even if a function terminates unexpectedly, which is crucial for diagnosing complex issues like CUDA kernel crashes. Furthermore, the logging system is optimized for performance, offering zero overhead when disabled, and intelligently handles CUDA graph capture to avoid performance bottlenecks. This infrastructure will be invaluable for understanding MoE workloads and other performance-critical operations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA new API logging decorator framework is introduced to FlashInfer that enables configurable logging of API calls via environment variables. The Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant flashinfer_api
participant Logger
participant OriginalFunc
participant CUDA
User->>flashinfer_api: Call decorated function with args
activate flashinfer_api
Note over flashinfer_api: Check FLASHINFER_LOGLEVEL
alt Level == 0 (Disabled)
flashinfer_api->>OriginalFunc: Call directly (zero-overhead)
else Level >= 1 (Enabled)
flashinfer_api->>Logger: Log function name & parameters
activate Logger
Logger-->>flashinfer_api:
deactivate Logger
flashinfer_api->>OriginalFunc: Call original function
activate OriginalFunc
OriginalFunc->>CUDA: Execute kernels
CUDA-->>OriginalFunc: Return results
deactivate OriginalFunc
flashinfer_api->>Logger: Log return values & stats (level-dependent)
activate Logger
alt Level == 5 & CUDA Graph Active
Logger-->>flashinfer_api: Skip stats, use placeholder
else
Logger-->>flashinfer_api: Include tensor statistics
end
deactivate Logger
end
flashinfer_api-->>User: Return results
deactivate flashinfer_api
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a comprehensive and well-designed API logging system for FlashInfer. The use of a decorator (@flashinfer_api_log) makes it easy to apply, and the control via environment variables (FLASHINFER_APILOG_LEVEL, FLASHINFER_APILOG_DEST) is flexible. The zero-overhead design when disabled is a key feature and is implemented correctly. The different logging levels provide a good range of verbosity for debugging, and the crash-safe logging of inputs is a great feature for diagnosing kernel crashes. The inclusion of a benchmark script and a thorough test suite is excellent.
My review focuses on improving the robustness and maintainability of the new logging module. I've identified a few areas for improvement, such as making exception handling more specific and simplifying some conditional logic.
Overall, this is a great addition to the library that will significantly improve debuggability.
flashinfer/api_logging.py
Outdated
|
|
||
|
|
||
| # Read environment variables once at module load time | ||
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int() conversion for FLASHINFER_APILOG_LEVEL is not robust against invalid user input. If a user sets this environment variable to a non-integer string (e.g., "foo"), it will raise a ValueError and prevent the module from being imported. It would be more robust to handle this case gracefully, for example by defaulting to level 0 and perhaps logging a warning.
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) | |
| try: | |
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) | |
| except (ValueError, TypeError): | |
| _API_LOG_LEVEL = 0 |
| except Exception: | ||
| # If we can't inspect the signature, return empty dict | ||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except Exception: block is too broad. It will catch any exception, including system-exiting exceptions like KeyboardInterrupt, which can hide bugs and make debugging difficult. The inspect.signature() function is documented to raise ValueError or TypeError on failure. It's better to catch only these specific exceptions.
| except Exception: | |
| # If we can't inspect the signature, return empty dict | |
| return {} | |
| except (ValueError, TypeError): | |
| # If we can't inspect the signature, return empty dict | |
| return {} |
| if "Wrapper" in class_name or class_name in [ | ||
| "BatchMLAPagedAttentionWrapper" | ||
| ]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition class_name in ["BatchMLAPagedAttentionWrapper"] is redundant because the class name "BatchMLAPagedAttentionWrapper" already contains the substring "Wrapper", which is checked in the first part of the or condition. This can be simplified for better readability and maintainability.
if "Wrapper" in class_name:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
tests/utils/test_logging.py (1)
227-247: Consider quieting Ruff’s unused‑argument warnings in testsSeveral test helpers intentionally don’t use all parameters (e.g.,
modeintest_default_parameters, arguments tocrashing_function, and the many typed parameters intest_different_data_types). If Ruff is run on tests, you may want to silence these via_-prefixed names or# noqa: ARG001on those definitions to keep the suite warning‑free; behavior is otherwise fine.Also applies to: 322-344, 355-383
flashinfer/fused_moe/core.py (1)
23-23: MoE public API logging is wired correctlyWrapping the high‑level MoE entrypoints (
cutlass_fused_moe,trtllm_bf16_moe, all FP8/FP4 variants) withflashinfer_api_loggives good coverage of the performance‑critical surface without touching the custom‑op registration or autotuning internals. Signatures and control flow are preserved. Just be aware that at logging level 3, tensor statistics over large MoE activations/weights will be non‑trivial in cost, so that mode should remain a targeted debugging tool rather than a default.Also applies to: 688-906, 1862-1941, 1943-2015, 2017-2093, 2095-2223, 2225-2354
flashinfer/api_logging.py (2)
28-68: Env‑driven logger setup matches goals; consider a tiny robustness tweakReading
FLASHINFER_APILOG_LEVEL/FLASHINFER_APILOG_DESTonce at import and configuring a dedicatedflashinfer.apilogger (with aNullHandlerat level 0) cleanly achieves “zero‑overhead when disabled” and isolates logs from the root logger. One minor robustness improvement would be to guard theint()conversion for_API_LOG_LEVELso a malformed env var falls back to level 0 (or a safe default) instead of raising at import.
345-367: Use thefunc_nameargument in_log_function_outputsfor clearer logs
_log_function_outputsacceptsfunc_namebut currently ignores it, which also triggers a linter warning. You could make the logs more self‑describing and fix the unused parameter by adding a small header including the function name. For example:def _log_function_outputs(func_name: str, result: Any, level: int) -> None: @@ - lines = [] - # Log outputs - lines.append("Output value:") + lines = [] + lines.append(f"FlashInfer API Return: {func_name}") + lines.append("Output value:") lines.append(_format_value(result, level, indent=1)) @@ - _logger.debug("\n".join(lines)) + _logger.debug("\n".join(lines))This preserves existing tests (which only assert on
"Output value:") while improving traceability.benchmarks/bench_logging_overhead.py (1)
35-38: Align benchmark log destination with api_logging’s default to avoid confusionThe script defaults
LOG_DESTto/tmp/flashinfer_benchmark_log.txt, butflashinfer.api_loggingdefaultsFLASHINFER_APILOG_DESTto./flashinfer_log.txtwhen the env var is unset. If users only setFLASHINFER_APILOG_LEVEL(as shown in the usage block), the benchmark will print and clean up/tmp/...while the actual logs go to./flashinfer_log.txt, so log‑size reporting and cleanup can silently miss the real file.To make behavior predictable, consider either:
- Matching the default with the library:
-LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "/tmp/flashinfer_benchmark_log.txt") +LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "./flashinfer_log.txt")and/or
- Explicitly propagating
LOG_DESTinto the environment before importingflashinfer.api_loggingwhen the env var is not already set.Either option will keep the benchmark’s “LOG FILE INFO” and cleanup in sync with where the decorator actually writes.
Also applies to: 265-327
flashinfer/gemm/gemm_base.py (3)
1842-2002: Decorator stacking onmm_fp4means logging happens after backend/shape checks
mm_fp4is now decorated as:@backend_requirement(...) @flashinfer_api_log def mm_fp4(...): ...This order implies that backend and problem-size checks (and any heuristic backend selection) run first, and only if they pass does the call enter the logging wrapper and function body. That’s a reasonable choice and doesn’t affect correctness, but it does mean invalid/problematic calls rejected by
backend_requirementwon’t appear in API logs.If you’d prefer to log all attempted API calls—including ones failing backend requirements—you might want to reverse the order:
@flashinfer_api_log @backend_requirement(...) def mm_fp4(...): ...Please double‑check the implementation of
backend_requirementto ensure it doesn’t rely on attributes on the original function that could be hidden by another wrapper (thoughfunctools.wrapsin the logger should preserve most metadata).
2100-2189:bmm_fp8logging is consistent; same decorator-order caveat asmm_fp4
bmm_fp8is also defined with:@backend_requirement(...) @flashinfer_api_log def bmm_fp8(...): ...So backends/requirements are checked before logging, and only successful calls will be logged. The attributes added by
backend_requirement(e.g.,suitable_auto_backends) still attach to the outer wrapper, so the internal use ofbmm_fp8.suitable_auto_backendsremains valid.If you want logs for calls that are rejected due to unsupported compute capability or shapes, consider swapping decorator order as suggested for
mm_fp4, and verify thatbackend_requirementremains compatible with being under the logging wrapper.
2501-2527: Potential double-logging when callinggemm_fp8_nt_blockscaled
gemm_fp8_nt_blockscaledis a thin wrapper aroundgemm_fp8_nt_groupwise, and both are decorated with@flashinfer_api_log. A call togemm_fp8_nt_blockscaledwill therefore emit two log entries: one forgemm_fp8_nt_blockscaledand one for the innergemm_fp8_nt_groupwisecall.This might be desirable (showing both the high-level alias and the underlying primitive), but could also add noise to logs.
If you want only a single log entry per API call here, you could either:
- Drop the decorator from
gemm_fp8_nt_blockscaled, relying on the innergemm_fp8_nt_groupwiselogging, or- Keep only the alias decorated and undeco
gemm_fp8_nt_groupwiseif it’s considered an internal helper.Please choose based on how you expect users to call these APIs directly versus via wrappers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)flashinfer/cudnn/decode.py(2 hunks)flashinfer/cudnn/prefill.py(2 hunks)flashinfer/decode.py(10 hunks)flashinfer/fused_moe/core.py(7 hunks)flashinfer/gemm/gemm_base.py(12 hunks)flashinfer/mla.py(4 hunks)flashinfer/prefill.py(11 hunks)tests/utils/test_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
flashinfer/mla.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api_log(369-462)decorator(418-457)wrapper(420-455)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/fused_moe/core.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/cudnn/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/cudnn/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/gemm/gemm_base.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
🪛 Ruff (0.14.5)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
346-346: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
161-161: Do not catch blind exception: Exception
(BLE001)
225-225: Use explicit conversion flag
Replace with conversion flag
(RUF010)
236-236: Use explicit conversion flag
Replace with conversion flag
(RUF010)
240-240: Use explicit conversion flag
Replace with conversion flag
(RUF010)
241-241: Do not catch blind exception: Exception
(BLE001)
284-284: Consider moving this statement to an else block
(TRY300)
285-285: Do not catch blind exception: Exception
(BLE001)
345-345: Unused function argument: func_name
(ARG001)
369-369: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
430-431: try-except-pass detected, consider logging the exception
(S110)
430-430: Do not catch blind exception: Exception
(BLE001)
441-441: Do not catch blind exception: Exception
(BLE001)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
452-452: Do not catch blind exception: Exception
(BLE001)
453-453: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (36)
flashinfer/cudnn/prefill.py (1)
6-6: Decorator wiring for cuDNN prefill entrypoint looks correctImporting
flashinfer_api_logand wrappingcudnn_batch_prefill_with_kv_cacheis consistent with the rest of the API surface; it doesn’t touch the cudnn-graph/jit internals and preserves function signature and control flow. Logging remains opt‑in via env level.Also applies to: 387-543
flashinfer/mla.py (1)
22-22: MLA wrapper instrumentation is aligned with the logging designDecorating
BatchMLAPagedAttentionWrapper.__init__,plan, andrunwithflashinfer_api_logcleanly instruments the main public lifecycle without altering behavior. The class name pattern ensures logs are emitted asBatchMLAPagedAttentionWrapper.<method>, which is useful for traceability, and zero‑overhead semantics hold when logging is disabled at import time.Also applies to: 133-203, 204-305, 337-451
flashinfer/cudnn/decode.py (1)
6-6: cuDNN decode entrypoint is correctly wrapped for loggingThe
flashinfer_api_logimport and decorator oncudnn_batch_decode_with_kv_cachematch the prefill pattern, leave the cuDNN graph construction untouched, and preserve the API contract. This is a safe, minimal instrumentation point.Also applies to: 256-350
tests/utils/test_logging.py (1)
43-82: Comprehensive logging decorator tests look solidThe autouse fixture plus
setup_loggingpattern gives each test a clean environment and forcesflashinfer.api_loggingto re‑read env vars, which matches how the decorator is intended to be configured. The suite covers level semantics, enums, defaults vs explicit kwargs, crash‑safety, CUDA tensors (incl. graph capture), class methods, and multiple invocation patterns; all expectations are consistent with the current implementation inflashinfer/api_logging.py.Also applies to: 83-585
flashinfer/api_logging.py (2)
70-163: Value formatting and CUDA‑graph‑aware tensor stats look well thought out
_format_value’s handling of enums, tensors (with shape/stride/device metadata), FP4Tensor, and nested containers is consistent and log‑friendly. The level‑3 path correctly avoids statistics during CUDA graph capture viatorch.cuda.is_current_stream_capturingand gracefully degrades to a “[statistics error: …]” line on failures. This matches the intended “rich when needed, safe when it fails” design.
369-462: Decorator implementation aligns with zero‑overhead and crash‑safety requirements
flashinfer_api_logcorrectly returns the original function when_API_LOG_LEVEL == 0at import/decorator time, avoiding any wrapper call‑overhead in the disabled case. For enabled levels, the wrapper’s pre‑call logging (with class‑qualified names for*Wrapperclasses) and post‑call logging, both protected by broadtry/exceptblocks, achieve the “crash‑safe, best‑effort logging” goal without risking user code execution. The decorator also supports both@flashinfer_api_logand@flashinfer_api_log()usage, which matches the tests.flashinfer/prefill.py (11)
25-25: API logger import is correctly scoped and consistent with other modulesThe relative import of
flashinfer_api_logmirrors how other utilities are imported and avoids circular dependencies; no issues here.
877-910: Decoratingsingle_prefill_with_kv_cache_with_jit_moduleis safe and aligns with the logging designApplying
@flashinfer_api_loghere cleanly instruments this top-level helper: it only wraps the JIT module’sruncall and does not interfere with caching or custom-op registration elsewhere.If you expect this function to be called inside
torch.compileor CUDA graphs, please sanity‑check that logging levels ≥2 are disabled in those contexts to avoid unnecessary extra kernels for statistics.
912-985: Loggingsingle_prefill_with_kv_cachecovers the primary single-request prefill API (including the alias)Instrumenting
single_prefill_with_kv_cacheis appropriate: it is the main single-prefill entry point, and the aliassingle_prefill_with_kv_cache_return_lse(viafunctools.partial) will also be logged because it refers to the decorated function.Please confirm that your log level defaults (e.g., env
FLASHINFER_APILOG_LEVEL) are set so that heavy workloads don’t accidentally incur level‑3 tensor statistics on this hot path in production.
1331-1498: Wrapper__init__logging forBatchPrefillWithPagedKVCacheWrapperlooks reasonableAdding
@flashinfer_api_logon the constructor is useful for debugging misconfigurations (workspace size, CUDA‑graph buffers, backend selection) and doesn’t affect runtime behavior, since allocations and checks were already here.It may be worth running a small CUDA‑graph flow that constructs this wrapper in graph-enabled mode to ensure that level‑2/3 logging remains acceptable during initialization (even if execution itself happens inside graphs).
1527-1925: LoggingBatchPrefillWithPagedKVCacheWrapper.planis helpful; watch for overhead at high log levelsDecorating
planaligns with the goal of capturing problem setup (indptrs, masks, backends) and should not affect correctness, since the body is unchanged and the decorator is pure Python around it.Because
plancopies indptrs to CPU and computes derived arrays, logging at level‑3 (with statistics) on very large batches can add noticeable overhead. Ensure that in tight autotuning or repeated planning loops you keepFLASHINFER_APILOG_LEVELat 0–1 unless you explicitly need detailed introspection.
1984-2220: InstrumentingBatchPrefillWithPagedKVCacheWrapper.runis consistent and appears safeApplying
@flashinfer_api_logtorungives visibility into the main prefill execution path (including PDL, backend selection, and output tensor shapes) while leaving the core kernel launch logic unchanged. The aliasrun_return_lsealso goes through the decorated method, so both result variants are logged.For very latency‑sensitive use (e.g., high‑QPS decode), you may want to benchmark with
FLASHINFER_APILOG_LEVEL=1/2/3to validate that the added logging—especially level‑3 statistics—meets your overhead budget.
2359-2474: Constructor logging forBatchPrefillWithRaggedKVCacheWrappermatches the paged wrapper behaviorDecorating
__init__here mirrors the paged wrapper: creation of workspace buffers, CUDA‑graph buffers, and backend selection are now log-visible without touching the execution logic.As with the paged wrapper, it’s worth verifying that constructing this wrapper inside any higher‑level tooling (e.g., model factories) remains acceptable when logging is enabled, since constructor logs can be noisy if wrappers are created per‑request.
2503-2794: LoggingBatchPrefillWithRaggedKVCacheWrapper.planis appropriate and symmetric with paged prefillThe
@flashinfer_api_logdecorator onplanfor ragged KV behaves analogously to the paged variant, exposing shapes, dtypes, and backend choices. The computational path (host transfers, planning, cached module setup) remains unchanged.Same suggestion as for the paged
plan: if you run many smallplancalls (e.g., in tuning or dynamic workloads), keep logging levels conservative to avoid overhead from repeated detailed summaries.
2848-2995:BatchPrefillWithRaggedKVCacheWrapper.runlogging cleanly wraps the main ragged prefill executionInstrumenting
runmakes sense: it’s the hot execution entry point and logs will now include mask mode, backend, and output shapes, which are valuable for debugging MoE / ragged scenarios. No functional changes are introduced.Consider adding this method to any logging tests you already have (similar to the decode wrappers) to confirm that level‑2/3 input summaries behave as expected with ragged indptr layouts.
3205-3337:trtllm_ragged_attention_deepseekis a good candidate for API loggingThe decorator around this Triton/TRT‑LLM‑style ragged attention wrapper will expose sequence length, scaling, and PDL settings without affecting the underlying kernel invocation. Given this is a specialized path, logging is particularly useful here.
Because this function is often used in brittle integration scenarios, you may want to add a small smoke test that calls it with logging level 2–3 to ensure tensor summaries do not assume contiguous layouts beyond what the implementation already guarantees.
3340-3553: Loggingtrtllm_batch_context_with_kv_cacheis consistent with other TRTLLM interfacesAdding
@flashinfer_api_loghere gives observability into FP4/FP8 output configuration, kv layout, and PDL usage. Since the decorator is a thin wrapper and all runtime behavior (including FP4 tensor handling) is unchanged, this looks safe.One thing to double‑check is that when
outis anFP4Tensor, the logging layer’s input summarization doesn’t accidentally try to introspect internal fields in a way that could be overly verbose; if so, you might consider teaching the logger a lightweight summary forFP4Tensor.flashinfer/gemm/gemm_base.py (9)
25-25: Importingflashinfer_api_loghere is consistent with other modulesThe relative import from
..api_loggingis correct for this package layout and keeps GEMM logging centralized alongside other instrumented APIs.
914-1084: SegmentGEMMWrapper.run instrumentation looks correct and non-intrusiveDecorating
SegmentGEMMWrapper.runadds valuable logging around a complex segmented GEMM path (seg_lens/seg_indptr, backend choice, shapes) without altering the kernel launch logic. Theforward = runalias will also be logged since it resolves to the decorated method.Given this is a potentially hot path in MoE workloads, you may want to benchmark with logging enabled at levels 2–3 to measure overhead from argument summarization on large
x/weights.
1573-1682:mm_fp8logging is well-placed on this primary FP8 GEMM entry pointThe decorator wraps only the public Python API; core work is still delegated to
trtllm_low_latency_gemm. No interactions with autotune or backend selection are altered.Since
mm_fp8can be called in tight loops, please confirm via the newbench_logging_overhead.pybenchmark that your target usage still meets latency goals at different logging levels.
2192-2356: Logginggemm_fp8_nt_groupwiseis appropriate; be mindful of level‑3 cost on large tilesThis function is a central entry point for FP8 groupwise GEMM on Blackwell; adding
@flashinfer_api_logis aligned with the overall API observability goal. The underlying CUTLASS/TRTLLM calls and device checks are unchanged.Because this path typically operates on large (M,N,K) blocks and may be used in performance‑critical contexts, it’s worth evaluating level‑3 logging separately—tensor statistics over large matrices can add non‑trivial overhead even if the core compute kernels are unchanged.
2531-2690:group_gemm_fp8_nt_groupwiselogging is consistent with other group GEMM entry pointsApplying
@flashinfer_api_loghere exposes group sizes, scale layouts, and hardware constraints for grouped FP8 GEMM. The function’s internal device checks and workspace handling are unchanged.Since this path asserts several alignment constraints (e.g.,
n % 8 == 0,k % 16 == 0), logging invalid calls at level ≥1 should be useful in diagnosing misconfigurations—ensure your logging level in debug runs is at least 1 to capture these.
2693-2825: Logginggroup_gemm_mxfp8_mxfp4_nt_groupwiseis appropriate for this specialized MXFP4 pathThe decorator wraps a fairly specialized Blackwell-only path that mixes FP8 activations with MXFP4 weights. Logging call shapes, tile sizes, and
swap_abwill be valuable in debugging, and the underlying SM100 module calls are preserved.Given the stricter assertions (tile sizes, alignment, dtype constraints), consider adding a small unit test that calls this function with logging enabled to ensure the logger handles packed MXFP4 tensors (
uint8) and scale tensors without excessive verbosity.
2861-2989:group_deepgemm_fp8_nt_groupwiselogging makes DeepGEMM MoE flows more observableInstrumenting this DeepGEMM-based grouped FP8 GEMM call will surface group assignments (
m_indices), scale granularities, and output sizes without altering the call tom_grouped_fp8_gemm_nt_contiguous. The architecture check remains in place.Because this function is likely used in MoE expert-routing paths, you may want to validate that logging large
m_indicestensors at level ≥2 is summarized compactly enough (e.g., only shape/type) to keep logs readable.
2992-3125:batch_deepgemm_fp8_nt_groupwiselogging mirrors the grouped DeepGEMM pathThe
@flashinfer_api_logdecorator here complementsgroup_deepgemm_fp8_nt_groupwise, covering the batched FP8 DeepGEMM case. The masking logic (masked_m,expected_m) and backend call remain unchanged.As with the grouped variant, confirm that logging of
masked_mandexpected_mbehaves as expected in your tests—these values are key for understanding performance characteristics and correctness in partially‑filled batches.
568-641: Remove SM110 from the docstring—the underlying C++ implementation only supports SM100 and SM103The runtime check is correct. Analysis of the C++ source reveals:
csrc/tgv_gemm.cuincludescutlass/gemm/collective/builders/sm100_common.inland uses SM100-specifictcgen05.mmainstructionsgen_tgv_gemm_sm10x_moduleis documented as "Generate TGV GEMM module for SM100 architecture"- No SM110 support path exists in the TGV GEMM implementation
- The test file correctly restricts to
["100", "103"]The docstring at lines 573–580 incorrectly claims SM110 support. It should be corrected to:
Requires SM100, SM103 architectureAdding SM110 to the runtime check as suggested would enable an unsupported architecture.
Likely an incorrect or invalid review comment.
flashinfer/decode.py (10)
24-24: LGTM - API logging import added.The import is correctly placed and follows the project's import organization.
316-346: API logging decorator correctly applied.The decorator is properly applied to this public API function. As a POC, this demonstrates the instrumentation approach. Before production use, verify that the logging overhead at levels 1-3 is acceptable for your performance requirements.
393-410: Correct decorator placement on overloaded function.The decorator is correctly applied to the implementation function rather than the overload signatures. This is the proper pattern for decorated overloaded functions.
652-664: API logging decorator correctly applied to constructor.The decorator is properly applied to the class constructor. The decorator implementation correctly detects and includes the class name in logs.
816-840: API logging decorator correctly applied to plan method.The decorator is properly applied. Note that at log level 2+, this method will generate verbose logs due to the many parameters. This is expected behavior for API logging and helps with debugging.
1170-1186: Correct decorator placement on overloaded method.The decorator is correctly applied to the implementation method rather than the overload signatures. The decorator will properly capture the
*argsparameter for logging.
2068-2089: API logging decorator correctly applied.The decorator is properly applied to this public API function. The extensive parameters and complex logic make this a good candidate for API logging to aid debugging.
2347-2364: API logging decorator correctly applied.The decorator is properly applied to this public API function.
2539-2556: API logging decorator correctly applied.The decorator is properly applied to this MLA-specific decode function.
2707-2724: API logging decorator correctly applied.The decorator is properly applied to this function. Overall, the API logging instrumentation in this file is consistent and well-implemented.
POC Assessment: The decorator applications throughout this file demonstrate a clean, non-invasive approach to API logging. The implementation maintains zero overhead when disabled and preserves function signatures and behavior. Before moving beyond POC, consider:
- Verify benchmark results show acceptable overhead at log levels 1-3
- Consider adding configuration to selectively enable/disable logging for specific functions
- Document the logging format and levels for end users
Based on the PR description, benchmarks exist at
benchmarks/bench_logging_overhead.py. Please verify the benchmark results demonstrate acceptable performance characteristics at different log levels, particularly for hot-path functions likeBatchDecodeWithPagedKVCacheWrapper.run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
flashinfer/api_logging.py (3)
338-362: Narrow_get_default_paramsexception handling to the documented cases
inspect.signatureis documented to raiseTypeError/ValueErroron failure; catching everyExceptionhere is broader than necessary and makes debugging real failures harder. The earlier review comment already pointed this out.You can keep the fail‑safe behavior but tighten the handler:
- except Exception: - # If we can't inspect the signature, return empty dict - return {} + except (TypeError, ValueError): + # If we can't inspect the signature, return empty dict + return {}This still preserves the “logging must not crash user code” guarantee.
493-528: Simplify class-name check and improve logging of internal logging errorsTwo small points around the decorator:
The condition
if "Wrapper" in class_name or class_name in ["BatchMLAPagedAttentionWrapper"]:is redundant because"BatchMLAPagedAttentionWrapper"already matches the"Wrapper"substring. This was noted in a previous review.In the pre/post logging
tryblocks you catchExceptionand log via_logger.error(...), which drops the traceback. Given this is a debugging/logger subsystem, having the traceback vialogging.exceptionis usually more helpful and you’re already isolating it from user code.Possible refactor:
- class_name = args[0].__class__.__name__ - if "Wrapper" in class_name or class_name in [ - "BatchMLAPagedAttentionWrapper" - ]: - func_name = f"{class_name}.{func_name}" + class_name = args[0].__class__.__name__ + if "Wrapper" in class_name: + func_name = f"{class_name}.{func_name}" except Exception: pass @@ - try: + try: if _API_LOG_LEVEL == 1: @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (pre-execution)]", func_name + ) @@ - try: + try: if _API_LOG_LEVEL >= 2: @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (outputs)]", func_name + )This keeps user‑visible behavior the same while making logger failures easier to diagnose.
29-31: MakeFLASHINFER_APILOG_LEVELparsing robust to invalid env valuesRight now
int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))will raiseValueError(and break import) if the env var is set to a non‑integer string. It’s safer to treat invalid values as “0” (disabled) so logging config can never prevent FlashInfer from importing.You can keep the behavior while hardening parsing like this:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) +try: + _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) +except (TypeError, ValueError): + # Invalid user input; fall back to level 0 (disabled) + _API_LOG_LEVEL = 0Please rerun the existing logging tests (and any import-time tests) after this change.
🧹 Nitpick comments (2)
flashinfer/api_logging.py (2)
145-304: Optional: guard against recursive structures in_format_value
_format_valuewalks lists/tuples/dicts recursively and will recurse indefinitely on cyclic structures (e.g., a list containing itself), which can appear in complex call graphs. In a debugging logger this is usually rare but when it happens it’ll raiseRecursionErrorfrom the logging path.If you want to harden this, consider adding a
seen: set[int]parameter (or a max-depth cut‑off) and short‑circuit whenid(value)is already inseen. No need to change now if your call sites never pass cyclic objects, but it’s something to keep in mind.
420-441: Usefunc_namein_log_function_outputs(or drop it) to fix the unused arg and improve logs
func_nameis currently unused in_log_function_outputs, which triggers the linter and also makes the output block a bit less self‑describing when skimming logs.A small tweak makes the output clearer and fixes
ARG001:- # Log outputs - lines.append("Output value:") + # Log outputs + lines.append(f"Output value for {func_name}:")This keeps the existing separators and should be low‑risk, but please re-run the logging tests because they may assert on exact strings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/api_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.5)
flashinfer/api_logging.py
86-86: Do not catch blind exception: Exception
(BLE001)
106-106: Do not catch blind exception: Exception
(BLE001)
124-124: Do not catch blind exception: Exception
(BLE001)
132-132: Do not catch blind exception: Exception
(BLE001)
236-236: Do not catch blind exception: Exception
(BLE001)
300-300: Use explicit conversion flag
Replace with conversion flag
(RUF010)
311-311: Use explicit conversion flag
Replace with conversion flag
(RUF010)
315-315: Use explicit conversion flag
Replace with conversion flag
(RUF010)
316-316: Do not catch blind exception: Exception
(BLE001)
359-359: Consider moving this statement to an else block
(TRY300)
360-360: Do not catch blind exception: Exception
(BLE001)
420-420: Unused function argument: func_name
(ARG001)
444-444: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
505-506: try-except-pass detected, consider logging the exception
(S110)
505-505: Do not catch blind exception: Exception
(BLE001)
516-516: Do not catch blind exception: Exception
(BLE001)
517-517: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
527-527: Do not catch blind exception: Exception
(BLE001)
528-528: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
| def _setup_logger(): | ||
| """Set up the logger based on environment variables.""" | ||
| if _API_LOG_LEVEL == 0: | ||
| # Completely disable logging for zero overhead | ||
| _logger.addHandler(logging.NullHandler()) | ||
| _logger.setLevel(logging.CRITICAL + 1) # Higher than any level | ||
| return | ||
|
|
||
| # All enabled levels use loggging.DEBUG; verbosity is controlled by FLASHINFER_APILOG_LEVEL instead | ||
| _logger.setLevel(logging.DEBUG) | ||
|
|
||
| # Remove any existing handlers | ||
| _logger.handlers.clear() | ||
|
|
||
| # Create handler based on destination | ||
| if _API_LOG_DEST == "stdout": | ||
| handler = logging.StreamHandler(sys.stdout) | ||
| elif _API_LOG_DEST == "stderr": | ||
| handler = logging.StreamHandler(sys.stderr) | ||
| else: | ||
| handler = logging.FileHandler(_API_LOG_DEST, mode="a") | ||
|
|
||
| # Use a simple formatter (we'll format the detailed content ourselves) | ||
| formatter = logging.Formatter("%(message)s") | ||
| handler.setFormatter(formatter) | ||
|
|
||
| _logger.addHandler(handler) | ||
| _logger.propagate = False # Don't propagate to root logger | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid import-time failures when FLASHINFER_APILOG_DEST is invalid
logging.FileHandler(_API_LOG_DEST, mode="a") will raise (e.g., bad path, permission) at import time and prevent FlashInfer from loading when logging is enabled. For a “debugging only” facility this is a bit too fragile; better to degrade to stderr (or a NullHandler) instead of failing imports.
A minimal way to harden this while keeping semantics:
- # Create handler based on destination
- if _API_LOG_DEST == "stdout":
- handler = logging.StreamHandler(sys.stdout)
- elif _API_LOG_DEST == "stderr":
- handler = logging.StreamHandler(sys.stderr)
- else:
- handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ # Create handler based on destination; never let a bad path break import
+ try:
+ if _API_LOG_DEST == "stdout":
+ handler = logging.StreamHandler(sys.stdout)
+ elif _API_LOG_DEST == "stderr":
+ handler = logging.StreamHandler(sys.stderr)
+ else:
+ handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ except OSError as exc:
+ # Fall back to stderr so the rest of the library remains usable
+ handler = logging.StreamHandler(sys.stderr)
+ _logger.warning(
+ "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr",
+ _API_LOG_DEST,
+ exc,
+ )Afterward, it’d be good to verify behavior by setting FLASHINFER_APILOG_DEST to an unwritable path and importing flashinfer.api_logging in a small repro.
🏁 Script executed:
#!/bin/bash
# Find and examine the full flashinfer/api_logging.py file
fd -t f "api_logging.py" --exec cat {} \;Length of output: 19059
🏁 Script executed:
#!/bin/bash
# Check when _setup_logger() is called (module level vs function level)
rg -n "_setup_logger" -A2 -B2Length of output: 462
🏁 Script executed:
#!/bin/bash
# Search for any existing error handling around logging setup
rg -n "except|try|OSError" flashinfer/api_logging.pyLength of output: 594
Add error handling to prevent import-time failures from invalid logging paths
The current code will break module imports if FLASHINFER_APILOG_DEST points to an invalid or unwritable path. Since _setup_logger() is called at module load time (line 67), any OSError from logging.FileHandler() will prevent FlashInfer from loading entirely. For a debugging facility, this is too fragile.
Add error handling to gracefully degrade to stderr:
# Create handler based on destination; never let a bad path break import
+ try:
- # Create handler based on destination
- if _API_LOG_DEST == "stdout":
+ if _API_LOG_DEST == "stdout":
handler = logging.StreamHandler(sys.stdout)
- elif _API_LOG_DEST == "stderr":
+ elif _API_LOG_DEST == "stderr":
handler = logging.StreamHandler(sys.stderr)
- else:
+ else:
handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ except OSError as exc:
+ # Fall back to stderr so the rest of the library remains usable
+ handler = logging.StreamHandler(sys.stderr)
+ _logger.warning(
+ "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr",
+ _API_LOG_DEST,
+ exc,
+ )🤖 Prompt for AI Agents
In flashinfer/api_logging.py around lines 36 to 64, creating a FileHandler from
an invalid or unwritable FLASHINFER_APILOG_DEST can raise OSError at import time
and crash the module; wrap the handler creation in a try/except that catches
OSError (and IOException/OSError subclasses), and on exception fall back to a
logging.StreamHandler(sys.stderr) so the module still loads; also emit a
non-raising warning message via the selected handler or a temporary NullHandler
so the failure is recorded without propagating the exception, and ensure the
rest of the logger setup (formatter, addHandler, propagate) runs regardless of
whether the file handler succeeded.
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?
Thanks @yzh119. This is a question I considered, more from a "should we log what happens inside each function call because the current design only logs inputs and outputs" point of view. My initial sentiment is that logging kernel behavior will be tricky because we have multiple kernel providers -- some of which are closed source and others are OSS, and some that already come with logging:
|
|
@xslingcn @Ubospica @YiyanZhai @zanderjiang this PR should greatly reducing the overhead of extracting workloads for flashinfer-bench project. |
|
@yzh119 Thanks for sharing! The low-overhead design looks very useful. We definitely have plans to support low-overhead dump and CUDA graph support in our roadmap. I will leverage this PR as a key reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
flashinfer/api_logging.py (2)
43-47: MakeFLASHINFER_LOGLEVEL_DBGparsing robust to invalid valuesAs written,
int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))will raiseValueErrorat import time if a user sets the env var to a non‑integer (e.g."foo"), preventingflashinfer.api_logging(and any API importing it) from loading. For a debug‑only facility this is too fragile; it should gracefully fall back to level 0.You can harden this by catching parse errors and defaulting to 0 (optionally emitting a warning to stderr):
-# Read environment variables once at module load time -_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) -_API_LOG_DEST = _substitute_process_id( - os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout") -) +# Read environment variables once at module load time +_raw_log_level = os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0") +try: + _API_LOG_LEVEL = int(_raw_log_level) +except (TypeError, ValueError): + # Fall back to level 0 on invalid input instead of failing import + _API_LOG_LEVEL = 0 + print( + f"[flashinfer.api_logging] Invalid FLASHINFER_LOGLEVEL_DBG={_raw_log_level!r}, " + "falling back to 0", + file=sys.stderr, + ) +_API_LOG_DEST = _substitute_process_id( + os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout") +)This keeps existing semantics for valid values while avoiding import‑time crashes on bad configuration.
85-112: Prevent import-time failures whenFLASHINFER_LOGDEST_DBGpoints to an invalid path
_setup_logger()callslogging.FileHandler(_API_LOG_DEST, mode="a")unguarded whenever_API_LOG_DESTis neither"stdout"nor"stderr". If the env var points to an invalid or unwritable location, this raisesOSErrorat import time and breaks all FlashInfer imports when logging is enabled.For a debugging feature, it’s safer to degrade to stderr instead of failing import. For example:
- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This preserves current behavior for valid paths but avoids hard failures when the destination is misconfigured.
🧹 Nitpick comments (3)
tests/utils/test_logging.py (1)
227-228: Address RuffARG001unused-argument warnings in test helpers (optional)Several test helper functions intentionally ignore some parameters (
modeintest_default_parameters,x/yincrashing_function, and the various*_valparameters intest_different_data_types), which Ruff flags asARG001. If your CI runs Ruff on tests, you can silence these by either:
- Prefixing unused parameters with
_(e.g.,_mode,_int_val, …), or- Adding
# noqa: ARG001on the relevant function definitions.This keeps the tests behaviorally identical while satisfying the linter.
Also applies to: 322-323, 355-365
flashinfer/api_logging.py (2)
498-519:_log_function_outputs’sfunc_nameis unused — consider wiring it into the log line
_log_function_outputsacceptsfunc_namebut never uses it, which is slightly confusing and flagged by linters asARG001. You can either drop the parameter, or (more usefully) include it in the header line:- lines = [] - # Log outputs - lines.append("Output value:") + lines = [] + # Log outputs + lines.append(f"Output value: {func_name}")This keeps the existing
"Output value:"prefix (so current tests continue to pass) while making the per‑call output section self‑describing.
596-607: Improve diagnostics for logging failures withlogging.exceptionThe decorator intentionally guards pre‑ and post‑execution logging with broad
except Exceptionblocks to avoid interfering with the wrapped function. That makes sense, but usinglogging.error(..., %s)drops the traceback, which can make debugging logging issues tricky.You can keep the broad guard while emitting richer diagnostics via
logging.exception:- except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (pre-execution)]", func_name + ) @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}") + except Exception: + _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)This preserves crash‑safety while giving you a stack trace when something goes wrong inside the logging path.
Also applies to: 613-618
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)tests/utils/test_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(522-627)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api_log(522-627)decorator(581-622)wrapper(583-620)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.5)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
346-346: Do not catch blind exception: Exception
(BLE001)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
flashinfer/api_logging.py
142-142: Do not catch blind exception: Exception
(BLE001)
162-162: Do not catch blind exception: Exception
(BLE001)
180-180: Do not catch blind exception: Exception
(BLE001)
210-210: Do not catch blind exception: Exception
(BLE001)
314-314: Do not catch blind exception: Exception
(BLE001)
378-378: Use explicit conversion flag
Replace with conversion flag
(RUF010)
389-389: Use explicit conversion flag
Replace with conversion flag
(RUF010)
393-393: Use explicit conversion flag
Replace with conversion flag
(RUF010)
394-394: Do not catch blind exception: Exception
(BLE001)
437-437: Consider moving this statement to an else block
(TRY300)
438-438: Do not catch blind exception: Exception
(BLE001)
498-498: Unused function argument: func_name
(ARG001)
522-522: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
593-594: try-except-pass detected, consider logging the exception
(S110)
593-593: Do not catch blind exception: Exception
(BLE001)
606-606: Do not catch blind exception: Exception
(BLE001)
607-607: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
617-617: Do not catch blind exception: Exception
(BLE001)
618-618: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (2)
tests/utils/test_logging.py (1)
27-585: Comprehensive coverage of decorator behaviorThis test suite does a good job exercising the decorator across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, tensor metadata, nested structures, and CUDA graph scenarios; I don’t see functional issues here.
benchmarks/bench_logging_overhead.py (1)
56-219: Benchmark structure and CUDA timing look solidThe undecorated vs decorated matmul comparison, CUDA synchronization, warmup phase, and summary statistics all look correct for measuring logging overhead.
Also applies to: 221-339
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/gemm/gemm_base.py (1)
568-640: Fix SM110 support check and consider basic shape validation intgv_gemm_sm100Two minor issues here:
- The docstring says “Requires SM100, SM103, or SM110 architecture” (Lines 593–595), but the runtime check only allows
"100"and"103":if not _match_sm_version(a.device, ["100", "103"]): raise ValueError("TGV GEMM requires SM100, SM103 architecture")If SM110 is actually supported by the TGV kernel (as the name
gen_tgv_gemm_sm10x_moduleand note about SM100f suggest), this should include"110"; otherwise the note in the docstring should be adjusted to avoid misleading users.
- There’s no explicit validation of tensor ranks/shapes or bias shape. Given the comment “Computes: A @ B + bias” and the expectation
(M, K) @ (K, N) + (N,), adding a fast check fora.ndim == 2,b.ndim == 2,bias.ndim == 1,a.shape[1] == b.shape[0], andbias.shape[0] == b.shape[1]would fail early with clear error messages instead of deferring to the underlying kernel.Both changes would make this wrapper more robust without affecting the autotuning flow.
♻️ Duplicate comments (2)
flashinfer/api_logging.py (1)
28-45: Harden env parsing and handler setup to avoid import-time failuresRight now, two failure modes in this module can prevent
flashinferfrom importing at all:
- Line 42:
int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))raisesValueError/TypeErrorif the env var is non-numeric (e.g."foo"), aborting import.- Lines 65–71:
logging.FileHandler(_API_LOG_DEST, mode="a")will raiseOSErroron invalid/unwritable paths, also at import time.For a debug-only feature, this is too fragile; misconfigured env vars should disable logging or fall back safely, not make the library unusable.
A minimal hardening would be:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) +try: + _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) +except (TypeError, ValueError): + # Fall back to disabled logging on invalid level + _API_LOG_LEVEL = 0and in
_setup_logger:- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This keeps existing semantics in the happy path but makes misconfiguration non-fatal.
Also applies to: 51-82
benchmarks/bench_logging_overhead.py (1)
30-36: Align benchmark’sLOG_DESTwith the decorator’s actual destinationHere
LOG_DESTdefaults to/tmp/flashinfer_benchmark_log.txtwhenFLASHINFER_LOGDEST_DBGis unset, butflashinfer.api_loggingstill defaults its destination to"stdout". That means:
- The decorator logs to stdout.
- The benchmark later looks for a file at
LOG_DEST(Lines 248–251, 292–301) and usually finds nothing, so the “LOG FILE INFO” section never prints.To keep the benchmark’s view of the log destination consistent with the decorator, set the env var before importing
flashinfer_log:LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) LOG_DEST = os.environ.get( "FLASHINFER_LOGDEST_DBG", "/tmp/flashinfer_benchmark_log.txt" ) +# Ensure the decorator sees the same destination default +os.environ.setdefault("FLASHINFER_LOGDEST_DBG", LOG_DEST) # Import the decorator from flashinfer.api_logging import flashinfer_logThis preserves the current default path but guarantees both components use the same destination.
Also applies to: 248-251, 292-301
🧹 Nitpick comments (2)
flashinfer/cudnn/prefill.py (1)
6-6: Decorator integration and new backend flags look safe; consider documentingbackend.
- Importing and applying
@flashinfer_logtocudnn_batch_prefill_with_kv_cacheis semantically safe because the decorator is a no-op at log level 0 and preserves the function signature.- Adding
is_cuda_graph_compatibleandbackendis backward‑compatible; they are defaulted and only change behavior when explicitly used (e.g.,backend=="cubin"forces the fmha‑gen path).- Minor: the docstring mentions
is_cuda_graph_compatiblebut notbackend. It may be worth adding a short description forbackendfor consistency and to clarify the"cubin"override behavior.Also applies to: 387-411
tests/utils/test_logging.py (1)
43-585: Logging tests are comprehensive; Ruff ARG001 hints are low-priority in this contextThis suite does a good job validating
flashinfer_logbehavior (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, CUDA graphs, nested structures, etc.) and the fixture correctly resets env state and reloads the module between tests. The RuffARG001warnings about unused parameters in test functions (e.g.,mode,int_val, etc.) are expected for this style of tests; if you care about a clean lint run you can prefix those parameters with_or reference them trivially, but functionally the current code is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)flashinfer/cudnn/decode.py(2 hunks)flashinfer/cudnn/prefill.py(2 hunks)flashinfer/decode.py(10 hunks)flashinfer/fused_moe/core.py(7 hunks)flashinfer/gemm/gemm_base.py(12 hunks)flashinfer/mla.py(4 hunks)flashinfer/prefill.py(11 hunks)tests/utils/test_logging.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flashinfer/mla.py
🧰 Additional context used
🧬 Code graph analysis (9)
flashinfer/cudnn/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_log(466-564)decorator(519-560)wrapper(521-558)
flashinfer/cudnn/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/fused_moe/core.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/gemm/gemm_base.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
🪛 Ruff (0.14.5)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
flashinfer/api_logging.py
108-108: Do not catch blind exception: Exception
(BLE001)
128-128: Do not catch blind exception: Exception
(BLE001)
146-146: Do not catch blind exception: Exception
(BLE001)
154-154: Do not catch blind exception: Exception
(BLE001)
258-258: Do not catch blind exception: Exception
(BLE001)
322-322: Use explicit conversion flag
Replace with conversion flag
(RUF010)
333-333: Use explicit conversion flag
Replace with conversion flag
(RUF010)
337-337: Use explicit conversion flag
Replace with conversion flag
(RUF010)
338-338: Do not catch blind exception: Exception
(BLE001)
381-381: Consider moving this statement to an else block
(TRY300)
382-382: Do not catch blind exception: Exception
(BLE001)
442-442: Unused function argument: func_name
(ARG001)
466-466: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
531-532: try-except-pass detected, consider logging the exception
(S110)
531-531: Do not catch blind exception: Exception
(BLE001)
544-544: Do not catch blind exception: Exception
(BLE001)
545-545: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
555-555: Do not catch blind exception: Exception
(BLE001)
556-556: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
329-329: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (5)
flashinfer/decode.py (1)
24-24: API logging decoration across decode entry points looks correct and non-intrusive.
- Importing
flashinfer_logand applying it to all user‑facing decode APIs (single, batch, wrappers, and TRT‑LLM/XQA helpers) preserves signatures and overload contracts while adding centralized logging.- The decorator’s no‑op behavior at log level 0 avoids extra overhead in production, and
functools.wrapskeeps introspection andpartialmethod/aliasing (begin_forward,run_return_lse) working as before.- No behavioral changes to argument defaults, CUDA‑graph handling, or backend routing are introduced by these annotations.
Also applies to: 316-345, 393-410, 652-664, 816-840, 1170-1186, 2068-2089, 2342-2359, 2527-2544, 2689-2705
flashinfer/prefill.py (1)
25-25: Prefill API logging wiring is consistent and preserves existing behavior.
- Importing and applying
@flashinfer_logto single‑prefill helpers, both prefill wrappers (paged and ragged), and the TRT‑LLM prefill/context functions cleanly instruments all public prefill entry points without altering their logic.- Overloads remain untouched; only concrete implementations are wrapped, and
begin_forward/run_return_lsealiases continue to work through the decorated methods.- No changes to backend selection, CUDA‑graph handling, or tensor shape/device contracts are introduced in these hunks.
Also applies to: 877-888, 962-985, 1331-1346, 1527-1563, 1984-2000, 2359-2372, 2503-2531, 2848-2859, 3205-3227, 3340-3362
flashinfer/cudnn/decode.py (1)
6-6: cudnn decode logging and CUDA‑graph flag propagation look correct.
- Importing
flashinfer_logand decoratingcudnn_batch_decode_with_kv_cacheintegrates this entry point into the new logging system without touching its core logic.- The
is_cuda_graph_compatibleargument is defaulted and only matters in the fmha‑gen fallback path; CUDNN proper continues to behave as before.- Signature and docstring remain aligned, so existing callers are unaffected while gaining optional logging and an explicit CUDA‑graph compatibility hint.
Also applies to: 256-273
flashinfer/fused_moe/core.py (1)
23-23: API-level logging instrumentation for MoE wrappers looks correctImporting
flashinfer_logand decorating the public MoE entrypoints here cleanly extends logging without touching the underlying custom ops. Becauseflashinfer_logreturns the original function whenFLASHINFER_LOGLEVEL_DBG=0, these changes are behavior-preserving in the default/off configuration, and the wrappers’ signatures remain unchanged. No issues from the decorator stacking are apparent.Also applies to: 688-719, 1862-1881, 1943-1965, 2017-2040, 2095-2127, 2225-2257
flashinfer/gemm/gemm_base.py (1)
914-1083: Decorator-based logging on GEMM and segment-GEMM entrypoints looks consistentImporting
flashinfer_loghere and applying it toSegmentGEMMWrapper.run,tgv_gemm_sm100, and the various FP8/FP4 GEMM entrypoints (mm_fp8,mm_fp4,bmm_fp8,gemm_fp8_nt_groupwise, blockscaled/group variants, DeepGEMM group/batch wrappers) is consistent with the rest of the PR:
- Signatures and return types are unchanged.
- At
FLASHINFER_LOGLEVEL_DBG=0the decorator returns the original function, so hot paths (GEMM kernels) pay no additional cost when logging is disabled.- Where combined with
@backend_requirement, the backend checks still wrap the logged function cleanly.From a call-site perspective, this is a non-breaking way to surface API-level logs for GEMM operations.
Also applies to: 1573-1682, 1851-2002, 2101-2190, 2193-2356, 2502-2527, 2531-2690, 2694-2825, 2862-2989, 2993-3125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
LOGGING.md(1 hunks)README.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- README.md
🧰 Additional context used
🪛 GitHub Actions: pre-commit
LOGGING.md
[error] 1-1: pre-commit end-of-file-fixer hook failed. The hook modified LOGGING.md (added a newline at EOF).
🪛 LanguageTool
LOGGING.md
[grammar] ~24-~24: Ensure spelling is correct
Context: ...| Disabled (Default) | No logging (zero overhad) | Production | | 1 | Function Name...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (1)
LOGGING.md (1)
5-82: Well-structured documentation with clear guidance.The logging documentation is clearly organized with practical examples, environment variable reference, and real-world use cases (CUDA graph compatibility, multi-GPU process ID substitution). The Quick Start and logging levels table make it easy for users to get started.
LOGGING.md
Outdated
| @@ -0,0 +1,83 @@ | |||
| # FlashInfer Logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we wanna organize this file in docs/ etc?
cc @yzh119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or might be even better to convert this to an .rst so that it appears in the documentations
@yzh119 do you have suggestions on how I might convert and where I can place the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think placing it under docs/logging.rst and indexing it in docs/index.rst should be great.
how I might convert
I have tried pandoc before but I believe any existing llm (gemini/claude/gpt) could do a better job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzh119, makes sense. I converted in the latest commit and built the sphinx documentation locally
Here is how the index page looks: See red marks for where I placed it:

…ing level above 3 now automatically include cudnn and cublas API logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
benchmarks/bench_logging_overhead.py (1)
30-36: Ensure benchmark and decorator share the same log destinationWhen
FLASHINFER_APIDESTis unset,LOG_DESTdefaults to/tmp/flashinfer_benchmark_log.txt, butflashinfer.api_loggingstill defaults its destination to"stdout". In that case the decorator logs to stdout while the benchmark expects a file atLOG_DEST, so the “LOG FILE INFO” block is skipped because the file is never created.You can keep the current default path and guarantee consistency by seeding the env var before importing the decorator:
-# Get logging level BEFORE importing flashinfer -LOGGING_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0")) -LOG_DEST = os.environ.get("FLASHINFER_APIDEST", "/tmp/flashinfer_benchmark_log.txt") - -# Import the decorator -from flashinfer.api_logging import flashinfer_api +# Get logging level BEFORE importing flashinfer +LOGGING_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0")) +LOG_DEST = os.environ.get("FLASHINFER_APIDEST", "/tmp/flashinfer_benchmark_log.txt") + +# Ensure the decorator sees the same destination +os.environ.setdefault("FLASHINFER_APIDEST", LOG_DEST) + +# Import the decorator +from flashinfer.api_logging import flashinfer_apiThis keeps benchmark reporting aligned with where logs actually go.
Also applies to: 248-251, 293-301
flashinfer/api_logging.py (2)
41-44: GuardFLASHINFER_APILEVELparsing to avoid import-time crashes
_API_LOG_LEVELis computed with a bareint(os.environ.get("FLASHINFER_APILEVEL", "0")). If a user setsFLASHINFER_APILEVELto a non-integer (e.g."debug"), importingflashinfer.api_loggingwill raiseValueErrorand prevent FlashInfer from loading.For a debugging facility this should degrade gracefully instead of breaking imports. You can harden this with a small try/except:
-# Read environment variables once at module load time -_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILEVEL", "0")) +_raw_level = os.environ.get("FLASHINFER_APILEVEL", "0") +try: + _API_LOG_LEVEL = int(_raw_level) +except (TypeError, ValueError): + # Fall back to no logging on invalid config + _API_LOG_LEVEL = 0 + # Optional: print a one-shot warning to stderr or log once logger is ready.This keeps the default semantics while making misconfiguration non-fatal.
63-76: Make FileHandler creation resilient to badFLASHINFER_APIDESTpaths
_setup_logger()creates alogging.FileHandlerfor non-stdout/stderr destinations:if _API_LOG_DEST == "stdout": handler = logging.StreamHandler(sys.stdout) elif _API_LOG_DEST == "stderr": handler = logging.StreamHandler(sys.stderr) else: handler = logging.FileHandler(_API_LOG_DEST, mode="a")If
FLASHINFER_APIDESTpoints to an invalid or unwritable path,logging.FileHandlerwill raiseOSErrorat import time (since_setup_logger()runs at module load), breaking all imports offlashinfer.api_logging.To keep the library usable under bad logging config, wrap handler creation and fall back to stderr:
- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_APIDEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This preserves intended behavior while making logging configuration failure non-fatal.
🧹 Nitpick comments (3)
flashinfer/cudnn/prefill.py (1)
6-6: Logging decoration and API extension look correct; backend arg is undocumentedApplying
@flashinfer_apitocudnn_batch_prefill_with_kv_cacheand importing it here is consistent with the rest of the PR and should not change behavior when logging is disabled (APILEVEL=0). The newis_cuda_graph_compatibleandbackendparameters are threaded correctly into the cudnn-fallback path (get_cudnn_fmha_gen_module().prefill) and are no-ops for the main cuDNN-graph path.One small gap: the docstring documents
is_cuda_graph_compatiblebut notbackend, even thoughbackend="cubin"selects the non-graph cudnn-fmha path. Consider adding a short description of valid backend values and their effect for clarity.Also applies to: 387-411
flashinfer/gemm/gemm_base.py (2)
543-615: tgv_gemm_sm100 implementation is sound; consider tighter validation and doc/arch alignmentThe
tgv_gemm_sm100wrapper correctly:
- Restricts execution to SM100/103 via
_match_sm_version.- Enforces
ato be bf16/fp16 and dtype‑matchesb.- Uses
AutoTunerwith a dedicatedTGVGemmRunnerthat swaps(A, B)into the layout expected by the underlying kernel and returns an(M, N)output.Two optional improvements:
- The docstring says “Requires SM100, SM103, or SM110 architecture” but the check only allows
"100"and"103". Either extend_match_sm_versionhere to include"110"(if supported) or update the note to avoid mentioning SM110.- Add explicit checks on shapes and bias/device for clearer Python‑side errors, e.g.:
a.ndim == 2,b.ndim == 2a.shape[1] == b.shape[0]bias.shape == (b.shape[1],)andbias.device == a.deviceThese would fail fast with actionable messages instead of relying on lower‑level kernel assertions.
1558-1665: mm_fp8 wrapper is minimal and correct; behavior constraints are clearThe
mm_fp8helper is a thin, well‑scoped wrapper overtrtllm_low_latency_gemm:
- It restricts
backendto"trtllm_low_latency"and enforces this up front.- It only allows
torch.bfloat16outputs (either viaout_dtypewhen allocating, or viaout.dtypewhen an output tensor is provided), matchingsupported_out_dtypes.- It validates the output shape and device when
outis supplied and ensures consistency without_dtype.If you plan to support additional backends or output dtypes in the future, the current guards will need relaxing, but as written the function is coherent and safe. You may optionally add light checks on
a/bdtypes and K‑dimension compatibility to give earlier, Python‑level error messages; otherwise, leaving these to the underlying kernel is acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
LOGGING.md(1 hunks)README.md(1 hunks)benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)flashinfer/cudnn/decode.py(2 hunks)flashinfer/cudnn/prefill.py(2 hunks)flashinfer/decode.py(10 hunks)flashinfer/fused_moe/core.py(7 hunks)flashinfer/gemm/gemm_base.py(12 hunks)flashinfer/mla.py(4 hunks)flashinfer/prefill.py(11 hunks)tests/utils/test_logging.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- LOGGING.md
- flashinfer/mla.py
- flashinfer/fused_moe/core.py
🧰 Additional context used
🧬 Code graph analysis (8)
flashinfer/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
flashinfer/cudnn/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
flashinfer/cudnn/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
flashinfer/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api(464-562)decorator(517-558)wrapper(519-556)
flashinfer/gemm/gemm_base.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
🪛 Ruff (0.14.5)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
329-329: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
106-106: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
144-144: Do not catch blind exception: Exception
(BLE001)
152-152: Do not catch blind exception: Exception
(BLE001)
256-256: Do not catch blind exception: Exception
(BLE001)
320-320: Use explicit conversion flag
Replace with conversion flag
(RUF010)
331-331: Use explicit conversion flag
Replace with conversion flag
(RUF010)
335-335: Use explicit conversion flag
Replace with conversion flag
(RUF010)
336-336: Do not catch blind exception: Exception
(BLE001)
379-379: Consider moving this statement to an else block
(TRY300)
380-380: Do not catch blind exception: Exception
(BLE001)
440-440: Unused function argument: func_name
(ARG001)
464-464: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
529-530: try-except-pass detected, consider logging the exception
(S110)
529-529: Do not catch blind exception: Exception
(BLE001)
542-542: Do not catch blind exception: Exception
(BLE001)
543-543: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
553-553: Do not catch blind exception: Exception
(BLE001)
554-554: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (18)
flashinfer/cudnn/decode.py (1)
6-7: Decorator application oncudnn_batch_decode_with_kv_cachelooks goodImporting
flashinfer_apiand decorating only the public entry point keeps the cuDNN decode path consistent with the rest of the API while relying on the zero-overhead behavior at level 0. No signature or control-flow changes here, so this instrumentation is safe.Also applies to: 256-350
tests/utils/test_logging.py (1)
27-585: Comprehensive, well-isolated test coverage for the logging decoratorThe test suite does a good job covering all intended behaviors (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, nested structures, crash safety, and CUDA/CUDA-graph semantics), with per-test env reset and module reload to avoid cross-test contamination. This provides strong confidence in the decorator’s behavior.
flashinfer/decode.py (3)
24-24: Single‑decode API logging wiring is consistentImporting
flashinfer_apiand decoratingsingle_decode_with_kv_cache_with_jit_moduleandsingle_decode_with_kv_cacheintroduces pre/post call logging at the user‑visible decode entrypoints without touching their logic or overload contracts. The decorator placement and signatures look correct.Also applies to: 316-345, 393-584
652-783: BatchDecodeWithPagedKVCacheWrapper instrumentation is well‑placedDecorating
__init__,plan, andrunonBatchDecodeWithPagedKVCacheWrappercleanly exposes wrapper lifecycle and execution to the logging framework. Becauseflashinfer_apiis class‑aware, log records will carryBatchDecodeWithPagedKVCacheWrapper.<method>names, and aliases likebegin_forward = planandrun_return_lse = functools.partialmethod(run, ...)correctly reuse the already‑decorated methods. No behavioral regressions are evident.Also applies to: 816-1112, 1170-1390
2068-2337: Decode backend entrypoints uniformly covered by API loggingAdding
@flashinfer_apitotrtllm_batch_decode_with_kv_cache,xqa_batch_decode_with_kv_cache, and the MLA variants (trtllm_batch_decode_with_kv_cache_mla,xqa_batch_decode_with_kv_cache_mla) gives consistent logging across all decode backends and specializations. The decorators sit directly on the public wrappers that already enforce shape/dtype/device contracts, so logging will see the normalized arguments and final outputs without altering existing backend selection or error handling.Also applies to: 2343-2482, 2528-2687, 2690-2799
flashinfer/gemm/gemm_base.py (2)
25-25: Importing flashinfer_api here is appropriateBringing
flashinfer_apiintogemm_basealigns this module with the new API‑logging infrastructure and enables decoration of GEMM entrypoints without touching their core logic.
2032-2192: Decorator ordering and logging on FP4/FP8 GEMM paths look correctFor the FP4/FP8 GEMM helpers:
mm_fp4andbmm_fp8use@backend_requirement(...)outside@flashinfer_api. This ordering ensures:
backend_requirementcontinues to attach attributes likemm_fp4.suitable_auto_backendsandbmm_fp8.suitable_auto_backendsto the exported function objects.- The logging wrapper sees the fully normalized call arguments after backend requirements are satisfied, while still logging inputs before the heavy kernels run.
- The various group/DeepGEMM helpers (
gemm_fp8_nt_groupwise,gemm_fp8_nt_blockscaled,group_gemm_fp8_nt_groupwise,group_gemm_mxfp8_mxfp4_nt_groupwise,group_deepgemm_fp8_nt_groupwise,batch_deepgemm_fp8_nt_groupwise) are decorated directly, which:
- Preserves their existing shape/dtype/device validations and backend selection logic.
- Provides uniform pre/post‑call logging across SM100/120/121 CUTLASS and DeepGEMM paths.
Given
_API_LOG_LEVEL == 0short‑circuits the decorator to return the original functions, these changes keep the hot GEMM paths zero‑overhead in the default configuration while enabling rich diagnostics when explicitly enabled.Also applies to: 2289-2379, 2382-2546, 2634-2660, 2826-2957, 2994-3122, 3125-3257
flashinfer/prefill.py (11)
25-25: LGTM! Clean import of the API logging decorator.The import is correctly placed and necessary for instrumenting the public APIs throughout this file.
877-877: LGTM! Appropriate instrumentation of JIT module entry point.Decorating this function enables tracking of JIT-based prefill operations, which is valuable for debugging and performance analysis.
962-962: LGTM! Key public API correctly instrumented.The
single_prefill_with_kv_cachefunction is a primary user-facing API, making it an ideal candidate for API logging. The decorator will help users debug issues and understand performance characteristics.
1331-1331: LGTM! Wrapper initialization tracking is valuable.Instrumenting
__init__enables tracking of configuration parameters (workspace buffer size, layout, CUDA graph settings, backend choice), which is useful for diagnosing setup issues.
1527-1527: LGTM! Planning phase instrumentation is appropriate.The
planmethod sets up critical auxiliary data structures. Logging here captures the problem specification (batch size, head dimensions, mask configuration, etc.), which is essential for understanding execution behavior.
1984-1984: LGTM! Execution instrumentation completes the lifecycle coverage.Decorating
runcaptures the actual attention computation with tensors and runtime parameters, providing the final piece for end-to-end debugging of paged KV cache workflows.
2359-2359: LGTM! Consistent instrumentation pattern.Applying the decorator to
BatchPrefillWithRaggedKVCacheWrapper.__init__maintains consistency with the paged wrapper and provides parallel observability for ragged KV cache workflows.
2503-2503: LGTM! Planning phase tracking for ragged workflows.The decorator here enables debugging of ragged KV cache planning, capturing the indptr arrays and problem dimensions that drive kernel selection.
2848-2848: LGTM! Execution tracking completes ragged workflow coverage.Instrumenting the
runmethod provides visibility into ragged attention computation, completing end-to-end observability for this code path.
3205-3205: LGTM! TRT-LLM backend instrumentation is valuable.Decorating
trtllm_ragged_attention_deepseekensures observability for DeepSeek-specific workloads using the TRT-LLM backend, enabling debugging of this specialized code path.
3340-3340: LGTM! Comprehensive backend coverage.Instrumenting
trtllm_batch_context_with_kv_cachecompletes coverage of TRT-LLM backend entry points, ensuring that all backend options benefit from the logging infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
flashinfer/api_logging.py (4)
358-382: Narrow_get_default_paramsexception handling to the errorsinspect.signatureactually raises
_get_default_paramscurrently does:try: sig = inspect.signature(func) ... return default_params except Exception: # If we can't inspect the signature, return empty dict return {}Catching all
Exceptionhere is broader than needed and can hide unexpected bugs.inspect.signatureis documented to raiseTypeErrororValueErrorwhen it can’t produce a signature.You can make this safer without changing behavior:
- try: - sig = inspect.signature(func) - default_params = {} - ... - return default_params - except Exception: - # If we can't inspect the signature, return empty dict - return {} + try: + sig = inspect.signature(func) + default_params = {} + ... + return default_params + except (TypeError, ValueError): + # If we can't inspect the signature, return empty dict + return {}This keeps the “best-effort” semantics while avoiding masking unrelated exceptions.
523-529: Simplify redundantBatchMLAPagedAttentionWrappercheck in class-name handlingThe class-name logic currently special-cases
BatchMLAPagedAttentionWrappereven though its name already contains"Wrapper":class_name = args[0].__class__.__name__ if "Wrapper" in class_name or class_name in [ "BatchMLAPagedAttentionWrapper" ]: func_name = f"{class_name}.{func_name}"The explicit
"BatchMLAPagedAttentionWrapper"entry is redundant. You can simplify to:- class_name = args[0].__class__.__name__ - if "Wrapper" in class_name or class_name in [ - "BatchMLAPagedAttentionWrapper" - ]: - func_name = f"{class_name}.{func_name}" + class_name = args[0].__class__.__name__ + if "Wrapper" in class_name: + func_name = f"{class_name}.{func_name}"This keeps behavior identical while making the intent clearer.
41-44: MakeFLASHINFER_LOGLEVELparsing robust to invalid values
_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))will raiseValueErrorif a user setsFLASHINFER_LOGLEVELto a non-integer (e.g.,"debug"), preventingflashinfer.api_logging(and potentially FlashInfer) from importing.A small hardening tweak keeps imports safe and defaults to level 0 on bad input:
-# Read environment variables once at module load time -_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) -_API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout")) +# Read environment variables once at module load time +_raw_level = os.environ.get("FLASHINFER_LOGLEVEL", "0") +try: + _API_LOG_LEVEL = int(_raw_level) +except (TypeError, ValueError): + _API_LOG_LEVEL = 0 + +_API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout"))This matches the “debugging-only” nature of the feature while avoiding import-time failures on misconfiguration.
63-76: Prevent invalidFLASHINFER_LOGDESTfrom breaking imports by handlingFileHandlererrorsBecause
_setup_logger()runs at import time, an unwritable or malformedFLASHINFER_LOGDESTwill causelogging.FileHandler(_API_LOG_DEST, ...)to raiseOSErrorand break FlashInfer imports. For a debugging facility, that’s too fragile.Wrap handler creation and fall back to stderr on failure:
- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break imports + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST %r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This preserves behavior when the path is valid but avoids taking down the whole module when it isn’t.
benchmarks/bench_logging_overhead.py (1)
30-36: AlignLOG_DESTwithFLASHINFER_LOGDESTso the benchmark sees the actual log fileRight now, if
FLASHINFER_LOGDESTis unset, the decorator defaults to"stdout"while the benchmark assumes logs go toLOG_DEST(/tmp/flashinfer_benchmark_log.txt). In that case no file is created and the “LOG FILE INFO” block is silently skipped.Set the environment variable before importing the decorator so both use the same destination:
-# Get logging level BEFORE importing flashinfer -LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) -LOG_DEST = os.environ.get("FLASHINFER_LOGDEST", "/tmp/flashinfer_benchmark_log.txt") - -# Import the decorator -from flashinfer.api_logging import flashinfer_api +# Get logging level BEFORE importing flashinfer +LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) +LOG_DEST = os.environ.get("FLASHINFER_LOGDEST", "/tmp/flashinfer_benchmark_log.txt") + +# Ensure the decorator sees the same destination (don’t override if user set it) +os.environ.setdefault("FLASHINFER_LOGDEST", LOG_DEST) + +# Import the decorator +from flashinfer.api_logging import flashinfer_apiThis keeps the current default path while guaranteeing the decorator writes to the file that
main()inspects.Also applies to: 248-251, 293-301
🧹 Nitpick comments (5)
benchmarks/bench_logging_overhead.py (2)
32-32: Consider avoiding a hard-coded/tmplog path in shared environmentsUsing a fixed path like
"/tmp/flashinfer_benchmark_log.txt"is fine for local, single-user benchmarking, but in shared or security-sensitive environments it can trip linters (S108) and create race/symlink issues.If you expect this benchmark to be run more broadly, consider:
- deriving the default from
tempfile.gettempdir()and the PID, or- requiring callers to set
FLASHINFER_LOGDESTexplicitly and only using/tmp/...in docs/examples.No change is strictly required if this is dev-only tooling.
Also applies to: 248-251, 293-301
324-333: Catching bareExceptioninmain()is acceptable here but could be narrowedFor a benchmark CLI, catching
KeyboardInterruptexplicitly and then using a bareExceptionto print a traceback is workable. If you want stricter hygiene (and to appease BLE001), consider:
- keeping the explicit
KeyboardInterrupthandler, and- either letting other exceptions propagate (so pytest / callers see failures cleanly) or narrowing the catch to expected runtime errors.
Not critical for a standalone benchmark, so this can be deferred.
tests/utils/test_logging.py (1)
43-81: Test harness and coverage forflashinfer_apilook strong; only minor lint nits remainThe combination of:
- the autouse
setup_and_teardownfixture (resetting env + module cache), and- the
setup_logginghelper (per-test level/dest + reload)gives you clean, isolated tests that accurately reflect how the decorator reads environment variables at import time. The behavioral checks across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash-safety, tensor metadata, nested structures, and CUDA graph capture all align well with
flashinfer/api_logging.py.The only minor follow-ups you may want, mainly to keep Ruff quiet:
- For tests that intentionally don’t use parameters (e.g.,
modeintest_default_parameters, the many*_valargs intest_different_data_types), consider prefixing them with_or adding# noqa: ARG001on the function line.- For the
RuntimeError("Simulated crash")intest_crash_safety_inputs_logged_before_execution, you can keep it as-is in tests; TRY003 is purely stylistic here.Functionally, this suite looks good to go.
Also applies to: 83-187, 218-285, 346-383
flashinfer/api_logging.py (2)
165-181: Update_format_valuedocstring to reflect actual levels (1, 3, 5)The
_format_valuedocstring says:
level : int— The logging level (1, 2, or 3)But the implementation and public API use levels 1, 3, and 5, with 5 enabling statistics. To avoid confusion for maintainers, consider tightening the wording, e.g.:
- level : int - The logging level (1, 2, or 3) + level : int + The API logging level (1, 3, or 5); higher levels include more detail.Purely a documentation fix; runtime behavior is already consistent.
440-461: Either usefunc_namein_log_function_outputsor remove it to avoid confusion
_log_function_outputsacceptsfunc_namebut doesn’t currently include it in the log:def _log_function_outputs(func_name: str, result: Any, level: int) -> None: ... lines.append("Output value:") lines.append(_format_value(result, level, indent=1))To make logs easier to scan and satisfy linters about the unused argument, you could:
- include the function name in the header, e.g.:
- lines = [] - # Log outputs - lines.append("Output value:") + lines = [] + # Log outputs + lines.append(f"Output value for {func_name}:")or
- drop
func_namefrom the signature and call sites if you prefer the current output format.Either direction is fine; the goal is just to keep the API surface and log format consistent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)tests/utils/test_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api(464-562)decorator(517-558)wrapper(519-556)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-562)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.5)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
329-329: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
106-106: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
144-144: Do not catch blind exception: Exception
(BLE001)
152-152: Do not catch blind exception: Exception
(BLE001)
256-256: Do not catch blind exception: Exception
(BLE001)
320-320: Use explicit conversion flag
Replace with conversion flag
(RUF010)
331-331: Use explicit conversion flag
Replace with conversion flag
(RUF010)
335-335: Use explicit conversion flag
Replace with conversion flag
(RUF010)
336-336: Do not catch blind exception: Exception
(BLE001)
379-379: Consider moving this statement to an else block
(TRY300)
380-380: Do not catch blind exception: Exception
(BLE001)
440-440: Unused function argument: func_name
(ARG001)
464-464: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
529-530: try-except-pass detected, consider logging the exception
(S110)
529-529: Do not catch blind exception: Exception
(BLE001)
542-542: Do not catch blind exception: Exception
(BLE001)
543-543: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
553-553: Do not catch blind exception: Exception
(BLE001)
554-554: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (2)
benchmarks/bench_logging_overhead.py (2)
152-201: Benchmark timing logic looks solid for CUDA workloadsThe warmup and
benchmark_functionimplementations correctly:
- run a warmup phase,
- synchronize the CUDA device before and after each timed call,
- and use
time.perf_counter()with per-iteration timing and summary stats.This is an appropriate pattern for measuring decorator overhead on GPU kernels.
229-232: The review comment is based on incorrect assumptions about the PyTorch API.The official PyTorch documentation explicitly shows that
torch.cuda.get_device_name()acceptstorch.device or int or stras parameters, and you can pass a device string like"cuda:0". The original code at lines 229-232, which passesdevice = "cuda:0"totorch.cuda.get_device_name(device), is valid and follows the documented API.The claim that this would raise a
TypeErroris incorrect. No changes are needed.Likely an incorrect or invalid review comment.
Hi @xslingcn @Ubospica @YiyanZhai @zanderjiang, it would be useful if there are any suggestions on the logging behavior that can benefit flashinfer-bench. I am leaving a logging level for dumping tensors for future work, but if there are any please feel free to provide any input. |
| handler = logging.FileHandler(_API_LOG_DEST, mode="a") | ||
|
|
||
| # Use a simple formatter (we'll add timestamps manually to key lines) | ||
| formatter = logging.Formatter("%(message)s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we expose the formatting as an env var to allow pid time stamps etc overridable by the user?
| This is useful for multi-process/multi-GPU environments where each process | ||
| needs its own log file. | ||
| """ | ||
| if "%i" in path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the standard library has
%(process)d
https://docs.python.org/3/library/logging.html#logrecord-attributes
we don't have to reinvent new things contrary to established standards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is about file name, i was probably mistaken...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the %(process)d only working in the logging module at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flashinfer/api_logging.py (4)
340-382: Narrow exception handling in_get_default_paramsto documented failuresCatching bare
Exceptionhere hides unexpected bugs;inspect.signatureis documented to raiseValueErrororTypeErroron failure. Restricting theexceptmakes this safer while still returning{}when signature inspection isn’t supported.- try: - sig = inspect.signature(func) - default_params = {} + try: + sig = inspect.signature(func) + default_params: dict[str, Any] = {} @@ - return default_params - except Exception: - # If we can't inspect the signature, return empty dict - return {} + return default_params + except (TypeError, ValueError): + # If we can't inspect the signature, return empty dict + return {}
525-531: Simplify the class-name condition for wrappers
"BatchMLAPagedAttentionWrapper"already contains"Wrapper", so the explicit membership check is redundant. Dropping the special case slightly simplifies the condition without changing behavior.- class_name = args[0].__class__.__name__ - if "Wrapper" in class_name or class_name in [ - "BatchMLAPagedAttentionWrapper" - ]: + class_name = args[0].__class__.__name__ + if "Wrapper" in class_name: func_name = f"{class_name}.{func_name}"
41-44: Handle invalidFLASHINFER_LOGLEVELvalues to avoid import-time crashes
int(os.environ.get("FLASHINFER_LOGLEVEL", "0"))will raiseValueErrorif the env var is a non-integer string, breaking module import for a pure debugging feature. Parse defensively and fall back to level 0 instead.-# Read environment variables once at module load time -_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL", "0")) +_API_LOG_LEVEL_RAW = os.environ.get("FLASHINFER_LOGLEVEL", "0") +try: + _API_LOG_LEVEL = int(_API_LOG_LEVEL_RAW) +except (TypeError, ValueError): + # Fall back to disabled logging on invalid value + _API_LOG_LEVEL = 0 + _API_LOG_DEST = _substitute_process_id(os.environ.get("FLASHINFER_LOGDEST", "stdout"))
63-76: Prevent invalidFLASHINFER_LOGDESTfrom breaking imports
logging.FileHandler(_API_LOG_DEST, mode="a")will raiseOSErroron bad/unwritable paths at import time, preventing FlashInfer from loading when logging is enabled. This is too fragile for a debugging-only facility; gracefully degrade to stderr instead.- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )
🧹 Nitpick comments (3)
flashinfer/api_logging.py (3)
280-323: Consider bounding list/tuple/dict logging to avoid pathological log volumeFor large or deeply nested containers, recursively logging every element can explode log size and latency at higher levels (3/5). Consider truncating after some N elements with a summary to keep logs usable while preserving signal.
- # Handle lists - if isinstance(value, list): + # Handle lists + if isinstance(value, list): if len(value) == 0: return f"{indent_str}[]" if level == 1: return f"{indent_str}[list with {len(value)} items]" - lines = [f"{indent_str}["] - for i, item in enumerate(value): - lines.append( - f"{indent_str} [{i}]: {_format_value(item, level, indent + 1)}" - ) + max_items = 16 + lines = [f"{indent_str}["] + for i, item in enumerate(value[: max_items]): + lines.append( + f"{indent_str} [{i}]: {_format_value(item, level, indent + 1)}" + ) + if len(value) > max_items: + lines.append( + f"{indent_str} ... ({len(value) - max_items} more items truncated)" + ) @@ - # Handle tuples - if isinstance(value, tuple): + # Handle tuples + if isinstance(value, tuple): @@ - lines = [f"{indent_str}("] - for i, item in enumerate(value): + max_items = 16 + lines = [f"{indent_str}("] + for i, item in enumerate(value[: max_items]): @@ - lines.append(f"{indent_str})") + if len(value) > max_items: + lines.append( + f"{indent_str} ... ({len(value) - max_items} more items truncated)" + ) + lines.append(f"{indent_str})") @@ - # Handle dictionaries - if isinstance(value, dict): + # Handle dictionaries + if isinstance(value, dict): @@ - lines = [f"{indent_str}{{"] - for key, val in value.items(): + max_items = 16 + lines = [f"{indent_str}{{"] + for i, (key, val) in enumerate(list(value.items())[: max_items]): @@ - lines.append(f"{indent_str}}}") + if len(value) > max_items: + lines.append( + f"{indent_str} ... ({len(value) - max_items} more keys truncated)" + ) + lines.append(f"{indent_str}}}")
440-462: Include function name in output logs and usefunc_name
func_nameis currently unused, and the output block doesn’t identify which API produced the value, which can be ambiguous with interleaved logs. Reusing the timestamped header format here improves attribution and removes the unused-arg warning.- lines = [] - # Log outputs - lines.append("Output value:") - lines.append(_format_value(result, level, indent=1)) - - lines.append("=" * 80) - lines.append("") # Empty line for readability + lines = [] + lines.append(f"{_get_timestamp()} FlashInfer API Output: {func_name}") + lines.append("-" * 80) + lines.append("Output value:") + lines.append(_format_value(result, level, indent=1)) + lines.append("=" * 80) + lines.append("") # Empty line for readability
520-557: Tighten decorator exception paths and avoid silentexcept ...: passTwo small robustness/observability nits:
- The
try/except Exception: passaroundclass_namesilently hides unexpected errors.- Pre/post logging paths use
logging.errorinstead oflogging.exception, losing the stack trace when logging itself fails.Switching to
logging.exceptionand logging a debug on class-name failures improves diagnosability without impacting crash-safety.- if args and hasattr(args[0], "__class__"): - try: - class_name = args[0].__class__.__name__ - if "Wrapper" in class_name or class_name in [ - "BatchMLAPagedAttentionWrapper" - ]: - func_name = f"{class_name}.{func_name}" - except Exception: - pass + if args and hasattr(args[0], "__class__"): + try: + class_name = args[0].__class__.__name__ + if "Wrapper" in class_name or class_name in [ + "BatchMLAPagedAttentionWrapper" + ]: + func_name = f"{class_name}.{func_name}" + except Exception: + _logger.debug( + "Failed to derive class name for %r in flashinfer_api", f + ) @@ - try: - if _API_LOG_LEVEL == 1: + try: + if _API_LOG_LEVEL == 1: @@ - _log_function_inputs(f, func_name, args, kwargs, _API_LOG_LEVEL) - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}") + _log_function_inputs(f, func_name, args, kwargs, _API_LOG_LEVEL) + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (pre-execution)]", func_name + ) @@ - try: - if _API_LOG_LEVEL >= 3: + try: + if _API_LOG_LEVEL >= 3: @@ - _log_function_outputs(func_name, result, _API_LOG_LEVEL) - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}") + _log_function_outputs(func_name, result, _API_LOG_LEVEL) + except Exception: + _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/api_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flashinfer/api_logging.py (2)
include/flashinfer/logging.h (1)
logging(31-41)csrc/nv_internal/include/tensorrt_llm/common/logger.h (1)
setLevel(75-78)
🪛 Ruff (0.14.5)
flashinfer/api_logging.py
106-106: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
144-144: Do not catch blind exception: Exception
(BLE001)
152-152: Do not catch blind exception: Exception
(BLE001)
256-256: Do not catch blind exception: Exception
(BLE001)
320-320: Use explicit conversion flag
Replace with conversion flag
(RUF010)
331-331: Use explicit conversion flag
Replace with conversion flag
(RUF010)
335-335: Use explicit conversion flag
Replace with conversion flag
(RUF010)
336-336: Do not catch blind exception: Exception
(BLE001)
379-379: Consider moving this statement to an else block
(TRY300)
380-380: Do not catch blind exception: Exception
(BLE001)
440-440: Unused function argument: func_name
(ARG001)
464-464: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
532-533: try-except-pass detected, consider logging the exception
(S110)
532-532: Do not catch blind exception: Exception
(BLE001)
545-545: Do not catch blind exception: Exception
(BLE001)
546-546: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
556-556: Do not catch blind exception: Exception
(BLE001)
557-557: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
|
(non-blocking) we can keep |
aleozlx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
giving my approval as a vote up
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks @bkryu for the great work!
|
/bot run |
|
[FAILED] Pipeline #38971561: 15/18 passed |

📌 Description
tl; dr: Current PR adds a logging system for input/output tracking to aid debugging FlashInfer APIs via a
@flashinfer_apidecorator.This PR does not label
@flashinfer_apito every FlashInfer API -- many operations are missing labels. Further labeling is left for subsequent work.This PR introduces a production-ready API logging infrastructure that tracks function calls, arguments, and return values via a simple one-line decorator. Any function can be decorated with the decorator to track the input/output values in the API logger.
Key Features:
FLASHINFER_LOGLEVELFLASHINFER_LOGDEST; defaults tostdoutbenchmarks/bench_logging_overhead.pyExample usage
produces log
export FLASHINFER_LOGLEVEL=3produces:export FLASHINFER_LOGLEVEL=5produces:🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
Release Notes
New Features
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.