Skip to content
83 changes: 83 additions & 0 deletions LOGGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# FlashInfer Logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we wanna organize this file in docs/ etc?

cc @yzh119

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or might be even better to convert this to an .rst so that it appears in the documentations

@yzh119 do you have suggestions on how I might convert and where I can place the file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think placing it under docs/logging.rst and indexing it in docs/index.rst should be great.

how I might convert

I have tried pandoc before but I believe any existing llm (gemini/claude/gpt) could do a better job.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzh119, makes sense. I converted in the latest commit and built the sphinx documentation locally

Here is how the index page looks: See red marks for where I placed it:
Screenshot 2025-11-21 at 5 17 47β€―PM

and here is how the actual logging page looks:
Screenshot 2025-11-21 at 5 18 19β€―PM


FlashInfer provides a logging feature to help debug issues, and reproduce crashes. This document describes all available logging levels and their features.

## Quick Start

Enable logging using two environment variables:

```bash
# Set logging level (0-5)
export FLASHINFER_LOGLEVEL=3

# Set log destination (default is stdout)
export FLASHINFER_LOGDEST=stdout # or stderr, or a file path like "flashinfer.log"

# Run your code
python train.py
```

## Logging Levels

| Level | Name | Features | Use Case |
|-------|------|----------|----------|
| **0** | Disabled (Default) | No logging (zero overhead) | Production |
| **1** | Function Names | Function names only | Basic tracing |
| **3** | Inputs/Outputs | Function names + arguments + outputs with metadata | Standard debugging |
| **5** | Statistics | Level 3 + tensor statistics (min, max, mean, NaN/Inf counts) | Numerical analysis |


## Environment Variables

### Main Configuration

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `FLASHINFER_LOGLEVEL` | int | 0 | Logging level (0, 1, 3, 5) |
| `FLASHINFER_LOGDEST` | str | `stdout` | Log destination: `stdout`, `stderr`, or file path |

### Process ID Substitution

Use `%i` in file paths for automatic process ID substitution (useful for multi-GPU training):

```bash
export FLASHINFER_LOGDEST="flashinfer_log_%i.txt" # β†’ flashinfer_log_12345.txt
```

This works for:
- `FLASHINFER_LOGDEST`

## Miscellaneous Notes and Examples
### CUDA Graph Compatibility

Level 5 statistics are **automatically skipped during CUDA graph capture** to avoid synchronization issues.

```python
# This works correctly - no synchronization errors
with torch.cuda.graph(cuda_graph):
result = mm_fp4(a, b, scales) # Level 5 logging active
# Statistics automatically skipped during capture
```

Output shows: `[statistics skipped: CUDA graph capture in progress]`

### Process IDs for Multi-GPU Environments

```bash
# Use %i for process ID substitution
export FLASHINFER_LOGLEVEL=3
export FLASHINFER_LOGDEST="logs/flashinfer_api_%i.log"

torchrun --nproc_per_node=8 awesome_script_that_uses_FlashInfer.py

# Creates separate logs:
# logs/flashinfer_api_12345.log (rank 0)
# logs/flashinfer_api_12346.log (rank 1)
# ...
```

## Frequently Asked Questions

### Q: Does Level 0 really have zero overhead?

**A: Yes.** At Level 0, the decorator returns the original function unchanged. No wrapper, no checks, no overhead.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,20 @@ o = flashinfer.single_prefill_with_kv_cache(q, k, v, causal=False) # prefill att

Check out [documentation](https://docs.flashinfer.ai/) for usage of batch decode/append/prefill kernels and shared-prefix cascading kernels.

## API Logging

FlashInfer provides comprehensive API logging for debugging. Enable it using environment variables:

```bash
# Enable logging (levels: 0=off (default), 1=basic, 3=detailed, 5=statistics)
export FLASHINFER_LOGLEVEL=3

# Set log destination (stdout (default), stderr, or file path)
export FLASHINFER_LOGDEST=stdout
```

For detailed information about logging levels, configuration, and advanced features, see [LOGGING.md](LOGGING.md).

## Custom Attention Variants

Starting from FlashInfer v0.2, users can customize their own attention variants with additional parameters. For more details, refer to our [JIT examples](https://github.com/flashinfer-ai/flashinfer/blob/main/tests/utils/test_jit_example.py).
Expand Down
Loading