FLux2 #4

Xiuzhenpeng · 2025-11-27T02:26:36Z

No description provided.

* mm: factor out the current stream getter Make this a reusable function. * ops: sync the offload stream with the consumption of w&b This sync is nessacary as pytorch will queue cuda async frees on the same stream as created to tensor. In the case of async offload, this will be on the offload stream. Weights and biases can go out of scope in python which then triggers the pytorch garbage collector to queue the free operation on the offload stream possible before the compute stream has used the weight. This causes a use after free on weight data leading to total corruption of some workflows. So sync the offload stream with the compute stream after the weight has been used so the free has to wait for the weight to be used. The cast_bias_weight is extended in a backwards compatible way with the new behaviour opt-in on a defaulted parameter. This handles custom node packs calling cast_bias_weight and defeatures async-offload for them (as they do not handle the race). The pattern is now: cast_bias_weight(... , offloadable=True) #This might be offloaded thing(weight, bias, ...) uncast_bias_weight(...) * controlnet: adopt new cast_bias_weight synchronization scheme This is nessacary for safe async weight offloading. * mm: sync the last stream in the queue, not the next Currently this peeks ahead to sync the next stream in the queue of streams with the compute stream. This doesnt allow a lot of parallelization, as then end result is you can only get one weight load ahead regardless of how many streams you have. Rotate the loop logic here to synchronize the end of the queue before returning the next stream. This allows weights to be loaded ahead of the compute streams position.

…ad partially' log messages (#10538)

* execution: Roll the UI cache into the outputs Currently the UI cache is parallel to the output cache with expectations of being a content superset of the output cache. At the same time the UI and output cache are maintained completely seperately, making it awkward to free the output cache content without changing the behaviour of the UI cache. There are two actual users (getters) of the UI cache. The first is the case of a direct content hit on the output cache when executing a node. This case is very naturally handled by merging the UI and outputs cache. The second case is the history JSON generation at the end of the prompt. This currently works by asking the cache for all_node_ids and then pulling the cache contents for those nodes. all_node_ids is the nodes of the dynamic prompt. So fold the UI cache into the output cache. The current UI cache setter now writes to a prompt-scope dict. When the output cache is set, just get this value from the dict and tuple up with the outputs. When generating the history, simply iterate prompt-scope dict. This prepares support for more complex caching strategies (like RAM pressure caching) where less than 1 workflow will be cached and it will be desirable to keep the UI cache and output cache in sync. * sd: Implement RAM getter for VAE * model_patcher: Implement RAM getter for ModelPatcher * sd: Implement RAM getter for CLIP * Implement RAM Pressure cache Implement a cache sensitive to RAM pressure. When RAM headroom drops down below a certain threshold, evict RAM-expensive nodes from the cache. Models and tensors are measured directly for RAM usage. An OOM score is then computed based on the RAM usage of the node. Note the due to indirection through shared objects (like a model patcher), multiple nodes can account the same RAM as their individual usage. The intent is this will free chains of nodes particularly model loaders and associate loras as they all score similar and are sorted in close to each other. Has a bias towards unloading model nodes mid flow while being able to keep results like text encodings and VAE. * execution: Convert the cache entry to NamedTuple As commented in review. Convert this to a named tuple and abstract away the tuple type completely from graph.py.

…10570)

* ops: dont take an offload stream if you dont need one * ops: prioritize mem transfer The async offload streams reason for existence is to transfer from RAM to GPU. The post processing compute steps are a bonus on the side stream, but if the compute stream is running a long kernel, it can stall the side stream, as it wait to type-cast the bias before transferring the weight. So do a pure xfer of the weight straight up, then do everything bias, then go back to fix the weight type and do weight patches.

Updated help text for the --fast argument to clarify potential risks.

…put of Rodin3D nodes (#10556)

…re. (#10874)

Thank you to the person who calculated them. You saved me a percent of my time.

…ges as inline base64 (#10918)

…esolution supports. (#10708) * Create nodes_dataset.py * Add encoded dataset caching mechanism * make training node to work with our dataset system * allow trainer node to get different resolution dataset * move all dataset related implementation to nodes_dataset * Rewrite dataset system with new io schema * Rewrite training system with new io schema * add ui pbar * Add outputs' id/name * Fix bad id/naming * use single process instead of input list when no need * fix wrong output_list flag * use torch.load/save and fix bad behaviors

…10927)

bigcat88 and others added 30 commits October 29, 2025 11:14

use new API client in Luma and Minimax nodes (#10528)

6c14f3a

Reduce memory usage for fp8 scaled op. (#10531)

1a58087

Fix case of weights not being unpinned. (#10533)

ec4fc2a

Try to fix slow load issue on low ram hardware with pinned mem. (#10536)

25de7b1

Fix small performance regression with fp8 fast and scaled fp8. (#10537)

906c089

Add units/info for the numbers displayed on 'load completely' and 'lo…

998bf60

…ad partially' log messages (#10538)

use new API client in Pixverse and Ideogram nodes (#10543)

163b629

fix img2img operation in Dall2 node (#10552)

dfac946

Add a ScaleROPE node. Currently only works on WAN models. (#10559)

614cf98

Fix rope scaling. (#10560)

27d1bd8

ScaleROPE now works on Lumina models. (#10578)

7f374e4

Fix torch compile regression on fp8 ops. (#10580)

c58c13b

added 12s-20s as available output durations for the LTXV API nodes (#…

5f109fe

…10570)

convert StabilityAI to use new API client (#10582)

20182a3

Fix issue with pinned memory. (#10597)

44869ff

Clarify help text for --fast argument (#10609)

97ff9fa

Updated help text for the --fast argument to clarify potential risks.

fix(api-nodes-cloud): stop using sub-folder and absolute path for out…

6d6a18b

…put of Rodin3D nodes (#10556)

fix(caching): treat bytes as hashable (#10567)

88df172

convert nodes_hypernetwork.py to V3 schema (#10583)

1f3f7a2

convert nodes_openai.py to V3 schema (#10604)

e617cdd

feat(Pika-API-nodes): use new API client (#10608)

4e2110c

chore: update embedded docs to v0.3.1 (#10614)

e974e55

People should update their pytorch versions. (#10618)

958a171

Speed up torch.compile (#10620)

0652cb8

Fixes (#10621)

e199c8c

Bring back fp8 torch compile performance to what it should be. (#10622)

6b88478

This seems to slow things down slightly on Linux. (#10624)

0f4ef3a

comfyanonymous and others added 29 commits November 25, 2025 02:48

Allow pinning quantized tensors. (#10873)

b680542

Don't try fp8 matrix mult in quantized ops if not supported by hardwa…

acfaa5c

…re. (#10874)

I found a case where this is needed (#10875)

015a059

Flux 2 (#10879)

6b573ae

[API Nodes] add Flux.2 Pro node (#10880)

5c7b08c

Add Flux 2 support to README. (#10882)

af81cb9

ComfyUI version v0.3.72

828b1b9

Fix crash. (#10885)

dff996c

Update workflow templates to v0.7.20 (#10883)

18b79ac

Lower vram usage for flux 2 text encoder. (#10887)

d196a90

ComfyUI v0.3.73

0c18842

Z Image model. (#10892)

e9aae31

Adjustments to Z Image. (#10893)

0e24dbb

Fix loras not working on mixed fp8. (#10899)

bdb10a5

ComfyUI v0.3.74

90b3995

Fix Flux2 reference image mem estimation. (#10905)

58b8574

ComfyUI version v0.3.75

8402c87

Add cheap latent preview for flux 2. (#10907)

f16219e

Thank you to the person who calculated them. You saved me a percent of my time.

add Veo3 First-Last-Frame node (#10878)

8938aa3

improve UX for batch uploads in upload_images_to_comfyapi (#10913)

1105e0d

fix(gemini): use first 10 images as fileData (URLs) and remaining ima…

8908ee2

…ges as inline base64 (#10918)

Bump frontend to 1.32.9 (#10867)

234c3dc

Merge 3d animation node (#10025)

58c6ed5

Fix the CSP offline feature. (#10923)

55f654d

Add Z Image to readme. (#10924)

dd41b74

chore(api-nodes): remove chat widgets from OpenAI/Gemini nodes (#10861)

d8433c6

convert nodes_customer_sampler.py to V3 schema (#10206)

a2d60aa

Make lora training work on Z Image and remove some redundant nodes. (#…

eaf68c9

…10927)

Xiuzhenpeng merged commit ec7e7bf into Xiuzhenpeng:flux2 Nov 27, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FLux2 #4

FLux2 #4

Uh oh!

Xiuzhenpeng commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

FLux2 #4

FLux2 #4

Uh oh!

Conversation

Xiuzhenpeng commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants