Skip to content

Commit 0ef5557

Browse files
committed
Add QOL feature for changing the custom nodes folder location through cli args.
bugfix: fix typo in apply_directory for custom_nodes_directory allow for PATH style ';' delimited custom_node directories. change delimiter type for seperate folders per platform. feat(API-nodes): move Rodin3D nodes to new client; removed old api client.py (comfyanonymous#10645) Fix qwen controlnet regression. (comfyanonymous#10657) Enable pinned memory by default on Nvidia. (comfyanonymous#10656) Removed the --fast pinned_memory flag. You can use --disable-pinned-memory to disable it. Please report if it causes any issues. Pinned mem also seems to work on AMD. (comfyanonymous#10658) Remove environment variable. Removed environment variable fallback for custom nodes directory. Update documentation for custom nodes directory Clarified documentation on custom nodes directory argument, removed documentation on environment variable Clarify release cycle. (comfyanonymous#10667) Tell users they need to upload their logs in bug reports. (comfyanonymous#10671) mm: guard against double pin and unpin explicitly (comfyanonymous#10672) As commented, if you let cuda be the one to detect double pin/unpinning it actually creates an asyc GPU error. Only unpin tensor if it was pinned by ComfyUI (comfyanonymous#10677) Make ScaleROPE node work on Flux. (comfyanonymous#10686) Add logging for model unloading. (comfyanonymous#10692) Unload weights if vram usage goes up between runs. (comfyanonymous#10690) ops: Put weight cast on the offload stream (comfyanonymous#10697) This needs to be on the offload stream. This reproduced a black screen with low resolution images on a slow bus when using FP8. Update CI workflow to remove dead macOS runner. (comfyanonymous#10704) * Update CI workflow to remove dead macOS runner. * revert * revert Don't pin tensor if not a torch.nn.parameter.Parameter (comfyanonymous#10718) Update README.md for Intel Arc GPU installation, remove IPEX (comfyanonymous#10729) IPEX is no longer needed for Intel Arc GPUs. Removing instruction to setup ipex. mm/mp: always unload re-used but modified models (comfyanonymous#10724) The partial unloader path in model re-use flow skips straight to the actual unload without any check of the patching UUID. This means that if you do an upscale flow with a model patch on an existing model, it will not apply your patchings. Fix by delaying the partial_unload until after the uuid checks. This is done by making partial_unload a model of partial_load where extra_mem is -ve. qwen: reduce VRAM usage (comfyanonymous#10725) Clean up a bunch of stacked and no-longer-needed tensors on the QWEN VRAM peak (currently FFN). With this I go from OOMing at B=37x1328x1328 to being able to succesfully run B=47 (RTX5090). Update Python 3.14 compatibility notes in README (comfyanonymous#10730) Quantized Ops fixes (comfyanonymous#10715) * offload support, bug fixes, remove mixins * add readme add PR template for API-Nodes (comfyanonymous#10736) feat: add create_time dict to prompt field in /history and /queue (comfyanonymous#10741) flux: reduce VRAM usage (comfyanonymous#10737) Cleanup a bunch of stack tensors on Flux. This take me from B=19 to B=22 for 1600x1600 on RTX5090. Better instructions for the portable. (comfyanonymous#10743) Use same code for chroma and flux blocks so that optimizations are shared. (comfyanonymous#10746) Fix custom nodes import error. (comfyanonymous#10747) This should fix the import errors but will break if the custom nodes actually try to use the class. revert import reordering revert imports pt 2 Add left padding support to tokenizers. (comfyanonymous#10753) chore(api-nodes): mark OpenAIDalle2 and OpenAIDalle3 nodes as deprecated (comfyanonymous#10757) Revert "chore(api-nodes): mark OpenAIDalle2 and OpenAIDalle3 nodes as deprecated (comfyanonymous#10757)" (comfyanonymous#10759) This reverts commit 9a02382. Change ROCm nightly install command to 7.1 (comfyanonymous#10764)
1 parent c4a6b38 commit 0ef5557

30 files changed

+572
-1417
lines changed

.github/ISSUE_TEMPLATE/bug-report.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,15 @@ body:
88
Before submitting a **Bug Report**, please ensure the following:
99
1010
- **1:** You are running the latest version of ComfyUI.
11-
- **2:** You have looked at the existing bug reports and made sure this isn't already reported.
11+
- **2:** You have your ComfyUI logs and relevant workflow on hand and will post them in this bug report.
1212
- **3:** You confirmed that the bug is not caused by a custom node. You can disable all custom nodes by passing
13-
`--disable-all-custom-nodes` command line argument.
13+
`--disable-all-custom-nodes` command line argument. If you have custom node try updating them to the latest version.
1414
- **4:** This is an actual bug in ComfyUI, not just a support question. A bug is when you can specify exact
1515
steps to replicate what went wrong and others will be able to repeat your steps and see the same issue happen.
1616
17-
If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
17+
## Very Important
18+
19+
Please make sure that you post ALL your ComfyUI logs in the bug report. A bug report without logs will likely be ignored.
1820
- type: checkboxes
1921
id: custom-nodes-test
2022
attributes:
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<!-- API_NODE_PR_CHECKLIST: do not remove -->
2+
3+
## API Node PR Checklist
4+
5+
### Scope
6+
- [ ] **Is API Node Change**
7+
8+
### Pricing & Billing
9+
- [ ] **Need pricing update**
10+
- [ ] **No pricing update**
11+
12+
If **Need pricing update**:
13+
- [ ] Metronome rate cards updated
14+
- [ ] Auto‑billing tests updated and passing
15+
16+
### QA
17+
- [ ] **QA done**
18+
- [ ] **QA not required**
19+
20+
### Comms
21+
- [ ] Informed **@Kosinkadink**
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
name: Append API Node PR template
2+
3+
on:
4+
pull_request_target:
5+
types: [opened, reopened, synchronize, edited, ready_for_review]
6+
paths:
7+
- 'comfy_api_nodes/**' # only run if these files changed
8+
9+
permissions:
10+
contents: read
11+
pull-requests: write
12+
13+
jobs:
14+
inject:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- name: Ensure template exists and append to PR body
18+
uses: actions/github-script@v7
19+
with:
20+
script: |
21+
const { owner, repo } = context.repo;
22+
const number = context.payload.pull_request.number;
23+
const templatePath = '.github/PULL_REQUEST_TEMPLATE/api-node.md';
24+
const marker = '<!-- API_NODE_PR_CHECKLIST: do not remove -->';
25+
26+
const { data: pr } = await github.rest.pulls.get({ owner, repo, pull_number: number });
27+
28+
let templateText;
29+
try {
30+
const res = await github.rest.repos.getContent({
31+
owner,
32+
repo,
33+
path: templatePath,
34+
ref: pr.base.ref
35+
});
36+
const buf = Buffer.from(res.data.content, res.data.encoding || 'base64');
37+
templateText = buf.toString('utf8');
38+
} catch (e) {
39+
core.setFailed(`Required PR template not found at "${templatePath}" on ${pr.base.ref}. Please add it to the repo.`);
40+
return;
41+
}
42+
43+
// Enforce the presence of the marker inside the template (for idempotence)
44+
if (!templateText.includes(marker)) {
45+
core.setFailed(`Template at "${templatePath}" does not contain the required marker:\n${marker}\nAdd it so we can detect duplicates safely.`);
46+
return;
47+
}
48+
49+
// If the PR already contains the marker, do not append again.
50+
const body = pr.body || '';
51+
if (body.includes(marker)) {
52+
core.info('Template already present in PR body; nothing to inject.');
53+
return;
54+
}
55+
56+
const newBody = (body ? body + '\n\n' : '') + templateText + '\n';
57+
await github.rest.pulls.update({ owner, repo, pull_number: number, body: newBody });
58+
core.notice('API Node template appended to PR description.');

.github/workflows/test-ci.yml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,15 @@ jobs:
2121
fail-fast: false
2222
matrix:
2323
# os: [macos, linux, windows]
24-
os: [macos, linux]
25-
python_version: ["3.9", "3.10", "3.11", "3.12"]
24+
# os: [macos, linux]
25+
os: [linux]
26+
python_version: ["3.10", "3.11", "3.12"]
2627
cuda_version: ["12.1"]
2728
torch_version: ["stable"]
2829
include:
29-
- os: macos
30-
runner_label: [self-hosted, macOS]
31-
flags: "--use-pytorch-cross-attention"
30+
# - os: macos
31+
# runner_label: [self-hosted, macOS]
32+
# flags: "--use-pytorch-cross-attention"
3233
- os: linux
3334
runner_label: [self-hosted, Linux]
3435
flags: ""
@@ -73,14 +74,15 @@ jobs:
7374
strategy:
7475
fail-fast: false
7576
matrix:
76-
os: [macos, linux]
77+
# os: [macos, linux]
78+
os: [linux]
7779
python_version: ["3.11"]
7880
cuda_version: ["12.1"]
7981
torch_version: ["nightly"]
8082
include:
81-
- os: macos
82-
runner_label: [self-hosted, macOS]
83-
flags: "--use-pytorch-cross-attention"
83+
# - os: macos
84+
# runner_label: [self-hosted, macOS]
85+
# flags: "--use-pytorch-cross-attention"
8486
- os: linux
8587
runner_label: [self-hosted, Linux]
8688
flags: ""

QUANTIZATION.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# The Comfy guide to Quantization
2+
3+
4+
## How does quantization work?
5+
6+
Quantization aims to map a high-precision value x_f to a lower precision format with minimal loss in accuracy. These smaller formats then serve to reduce the models memory footprint and increase throughput by using specialized hardware.
7+
8+
When simply converting a value from FP16 to FP8 using the round-nearest method we might hit two issues:
9+
- The dynamic range of FP16 (-65,504, 65,504) far exceeds FP8 formats like E4M3 (-448, 448) or E5M2 (-57,344, 57,344), potentially resulting in clipped values
10+
- The original values are concentrated in a small range (e.g. -1,1) leaving many FP8-bits "unused"
11+
12+
By using a scaling factor, we aim to map these values into the quantized-dtype range, making use of the full spectrum. One of the easiest approaches, and common, is using per-tensor absolute-maximum scaling.
13+
14+
```
15+
absmax = max(abs(tensor))
16+
scale = amax / max_dynamic_range_low_precision
17+
18+
# Quantization
19+
tensor_q = (tensor / scale).to(low_precision_dtype)
20+
21+
# De-Quantization
22+
tensor_dq = tensor_q.to(fp16) * scale
23+
24+
tensor_dq ~ tensor
25+
```
26+
27+
Given that additional information (scaling factor) is needed to "interpret" the quantized values, we describe those as derived datatypes.
28+
29+
30+
## Quantization in Comfy
31+
32+
```
33+
QuantizedTensor (torch.Tensor subclass)
34+
↓ __torch_dispatch__
35+
Two-Level Registry (generic + layout handlers)
36+
37+
MixedPrecisionOps + Metadata Detection
38+
```
39+
40+
### Representation
41+
42+
To represent these derived datatypes, ComfyUI uses a subclass of torch.Tensor to implements these using the `QuantizedTensor` class found in `comfy/quant_ops.py`
43+
44+
A `Layout` class defines how a specific quantization format behaves:
45+
- Required parameters
46+
- Quantize method
47+
- De-Quantize method
48+
49+
```python
50+
from comfy.quant_ops import QuantizedLayout
51+
52+
class MyLayout(QuantizedLayout):
53+
@classmethod
54+
def quantize(cls, tensor, **kwargs):
55+
# Convert to quantized format
56+
qdata = ...
57+
params = {'scale': ..., 'orig_dtype': tensor.dtype}
58+
return qdata, params
59+
60+
@staticmethod
61+
def dequantize(qdata, scale, orig_dtype, **kwargs):
62+
return qdata.to(orig_dtype) * scale
63+
```
64+
65+
To then run operations using these QuantizedTensors we use two registry systems to define supported operations.
66+
The first is a **generic registry** that handles operations common to all quantized formats (e.g., `.to()`, `.clone()`, `.reshape()`).
67+
68+
The second registry is layout-specific and allows to implement fast-paths like nn.Linear.
69+
```python
70+
from comfy.quant_ops import register_layout_op
71+
72+
@register_layout_op(torch.ops.aten.linear.default, MyLayout)
73+
def my_linear(func, args, kwargs):
74+
# Extract tensors, call optimized kernel
75+
...
76+
```
77+
When `torch.nn.functional.linear()` is called with QuantizedTensor arguments, `__torch_dispatch__` automatically routes to the registered implementation.
78+
For any unsupported operation, QuantizedTensor will fallback to call `dequantize` and dispatch using the high-precision implementation.
79+
80+
81+
### Mixed Precision
82+
83+
The `MixedPrecisionOps` class (lines 542-648 in `comfy/ops.py`) enables per-layer quantization decisions, allowing different layers in a model to use different precisions. This is activated when a model config contains a `layer_quant_config` dictionary that specifies which layers should be quantized and how.
84+
85+
**Architecture:**
86+
87+
```python
88+
class MixedPrecisionOps(disable_weight_init):
89+
_layer_quant_config = {} # Maps layer names to quantization configs
90+
_compute_dtype = torch.bfloat16 # Default compute / dequantize precision
91+
```
92+
93+
**Key mechanism:**
94+
95+
The custom `Linear._load_from_state_dict()` method inspects each layer during model loading:
96+
- If the layer name is **not** in `_layer_quant_config`: load weight as regular tensor in `_compute_dtype`
97+
- If the layer name **is** in `_layer_quant_config`:
98+
- Load weight as `QuantizedTensor` with the specified layout (e.g., `TensorCoreFP8Layout`)
99+
- Load associated quantization parameters (scales, block_size, etc.)
100+
101+
**Why it's needed:**
102+
103+
Not all layers tolerate quantization equally. Sensitive operations like final projections can be kept in higher precision, while compute-heavy matmuls are quantized. This provides most of the performance benefits while maintaining quality.
104+
105+
The system is selected in `pick_operations()` when `model_config.layer_quant_config` is present, making it the highest-priority operation mode.
106+
107+
108+
## Checkpoint Format
109+
110+
Quantized checkpoints are stored as standard safetensors files with quantized weight tensors and associated scaling parameters, plus a `_quantization_metadata` JSON entry describing the quantization scheme.
111+
112+
The quantized checkpoint will contain the same layers as the original checkpoint but:
113+
- The weights are stored as quantized values, sometimes using a different storage datatype. E.g. uint8 container for fp8.
114+
- For each quantized weight a number of additional scaling parameters are stored alongside depending on the recipe.
115+
- We store a metadata.json in the metadata of the final safetensor containing the `_quantization_metadata` describing which layers are quantized and what layout has been used.
116+
117+
### Scaling Parameters details
118+
We define 4 possible scaling parameters that should cover most recipes in the near-future:
119+
- **weight_scale**: quantization scalers for the weights
120+
- **weight_scale_2**: global scalers in the context of double scaling
121+
- **pre_quant_scale**: scalers used for smoothing salient weights
122+
- **input_scale**: quantization scalers for the activations
123+
124+
| Format | Storage dtype | weight_scale | weight_scale_2 | pre_quant_scale | input_scale |
125+
|--------|---------------|--------------|----------------|-----------------|-------------|
126+
| float8_e4m3fn | float32 | float32 (scalar) | - | - | float32 (scalar) |
127+
128+
You can find the defined formats in `comfy/quant_ops.py` (QUANT_ALGOS).
129+
130+
### Quantization Metadata
131+
132+
The metadata stored alongside the checkpoint contains:
133+
- **format_version**: String to define a version of the standard
134+
- **layers**: A dictionary mapping layer names to their quantization format. The format string maps to the definitions found in `QUANT_ALGOS`.
135+
136+
Example:
137+
```json
138+
{
139+
"_quantization_metadata": {
140+
"format_version": "1.0",
141+
"layers": {
142+
"model.layers.0.mlp.up_proj": "float8_e4m3fn",
143+
"model.layers.0.mlp.down_proj": "float8_e4m3fn",
144+
"model.layers.1.mlp.up_proj": "float8_e4m3fn"
145+
}
146+
}
147+
}
148+
```
149+
150+
151+
## Creating Quantized Checkpoints
152+
153+
To create compatible checkpoints, use any quantization tool provided the output follows the checkpoint format described above and uses a layout defined in `QUANT_ALGOS`.
154+
155+
### Weight Quantization
156+
157+
Weight quantization is straightforward - compute the scaling factor directly from the weight tensor using the absolute maximum method described earlier. Each layer's weights are quantized independently and stored with their corresponding `weight_scale` parameter.
158+
159+
### Calibration (for Activation Quantization)
160+
161+
Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_scale` parameters that cannot be determined from static weights alone. Since activation values depend on actual inputs, we use **post-training calibration (PTQ)**:
162+
163+
1. **Collect statistics**: Run inference on N representative samples
164+
2. **Track activations**: Record the absolute maximum (`amax`) of inputs to each quantized layer
165+
3. **Compute scales**: Derive `input_scale` from collected statistics
166+
4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
167+
168+
The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.

README.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -112,10 +112,11 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git
112112

113113
## Release Process
114114

115-
ComfyUI follows a weekly release cycle targeting Friday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:
115+
ComfyUI follows a weekly release cycle targeting Monday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:
116116

117117
1. **[ComfyUI Core](https://github.com/comfyanonymous/ComfyUI)**
118-
- Releases a new stable version (e.g., v0.7.0)
118+
- Releases a new stable version (e.g., v0.7.0) roughly every week.
119+
- Commits outside of the stable release tags may be very unstable and break many custom nodes.
119120
- Serves as the foundation for the desktop release
120121

121122
2. **[ComfyUI Desktop](https://github.com/Comfy-Org/desktop)**
@@ -172,7 +173,7 @@ There is a portable standalone build for Windows that should work for running on
172173

173174
### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z)
174175

175-
Simply download, extract with [7-Zip](https://7-zip.org) and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints
176+
Simply download, extract with [7-Zip](https://7-zip.org) or with the windows explorer on recent windows versions and run. For smaller models you normally only need to put the checkpoints (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints but many of the larger models have multiple files. Make sure to follow the instructions to know which subfolder to put them in ComfyUI\models\
176177

177178
If you have trouble extracting it, right click the file -> properties -> unblock
178179

@@ -199,7 +200,7 @@ comfy install
199200

200201
## Manual Install (Windows, Linux)
201202

202-
Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) but it is not recommended.
203+
Python 3.14 works but you may encounter issues with the torch compile node. The free threaded variant is still missing some dependencies.
203204

204205
Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12
205206

@@ -220,7 +221,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins
220221

221222
This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:
222223

223-
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0```
224+
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```
224225

225226

226227
### AMD GPUs (Experimental: Windows and Linux), RDNA 3, 3.5 and 4 only.
@@ -241,7 +242,7 @@ RDNA 4 (RX 9000 series):
241242

242243
### Intel GPUs (Windows and Linux)
243244

244-
(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
245+
Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
245246

246247
1. To install PyTorch xpu, use the following command:
247248

@@ -251,10 +252,6 @@ This is the command to install the Pytorch xpu nightly which might have some per
251252

252253
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```
253254

254-
(Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.
255-
256-
1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
257-
258255
### NVIDIA
259256

260257
Nvidia users should install stable pytorch using this command:

comfy/cli_args.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ def __call__(self, parser, namespace, values, option_string=None):
4747
parser.add_argument("--output-directory", type=str, default=None, help="Set the ComfyUI output directory. Overrides --base-directory.")
4848
parser.add_argument("--temp-directory", type=str, default=None, help="Set the ComfyUI temp directory (default is in the ComfyUI directory). Overrides --base-directory.")
4949
parser.add_argument("--input-directory", type=str, default=None, help="Set the ComfyUI input directory. Overrides --base-directory.")
50+
parser.add_argument("--custom-nodes-directory", type=str, default=None, help="Set the ComfyUI custom_nodes directory. Overrides --base-directory and environment variable COMFYUI_CUSTOM_NODES_DIR.")
5051
parser.add_argument("--auto-launch", action="store_true", help="Automatically launch ComfyUI in the default browser.")
5152
parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
5253
parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use. All other devices will not be visible.")
@@ -145,10 +146,11 @@ class PerformanceFeature(enum.Enum):
145146
Fp8MatrixMultiplication = "fp8_matrix_mult"
146147
CublasOps = "cublas_ops"
147148
AutoTune = "autotune"
148-
PinnedMem = "pinned_memory"
149149

150150
parser.add_argument("--fast", nargs="*", type=PerformanceFeature, help="Enable some untested and potentially quality deteriorating optimizations. This is used to test new features so using it might crash your comfyui. --fast with no arguments enables everything. You can pass a list specific optimizations if you only want to enable specific ones. Current valid optimizations: {}".format(" ".join(map(lambda c: c.value, PerformanceFeature))))
151151

152+
parser.add_argument("--disable-pinned-memory", action="store_true", help="Disable pinned memory use.")
153+
152154
parser.add_argument("--mmap-torch-files", action="store_true", help="Use mmap when loading ckpt/pt files.")
153155
parser.add_argument("--disable-mmap", action="store_true", help="Don't use mmap when loading safetensors.")
154156

0 commit comments

Comments
 (0)