-
Notifications
You must be signed in to change notification settings - Fork 179
offload device and main device have a huge impact on lora #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's possible the LoRa simply fails to load unless the model is initialized on offload device. |
@kijai Thanks for the quick reply! should I always use the offload device instead of the main device? |
Loras don't really seem to work properly here. With main device they don't do anything at all, but even with offload device they're not doing what they're supposed to I think. They work without issue in native comfyui, but the same base model, loras and prompt don't do much when used with the wrapper nodes, even though it claims to load successfully. There's definitely an effect, as it does change the image and you can create artifacts with >= 2.0 lora strength, but I don't think it's recognizing any movement from the Lora. Maybe it's related to how the wrapper is handling the prompt/clip? Thought initially it was the block swap maybe killing the lora, but the result seems similar without it. Sadly native comfyui's hunyuan is somewhat useless without the blockswap for my card. Is that feature one that could be added as a custom node maybe, or is it too integral to the process to be easily adaptable? |
I've had no issues with LoRas, or seen any particular LoRa related difference between the wrapper and native, both use comfy LoRa loading too. |
Hmm, maybe it's a local issue somehow. I'm trying now with a fully updated comfyui/wrapper and a clean and simple workflow for both. Take for instance the beautiful 'Titty drop' lora (tittydrop_v1) from civitai, since that's a very recognizable movement. In native it's working instantly and accurately every time, no matter the seed and other settings. She's always performing that particular movement, even though the scene and details may change. In the wrapper, with the same resolution, frame count, flow, guidance and prompt, it's yet to succeed once, and there's usually not even any real hint of that movement going on even though the actual scene is similar in both cases. Disconnecting the select lora node in the wrapper instantly produces a somewhat different scene however, one where the woman is now further away, or not facing the camera, or some other variation. So it's doing something. |
Actually, I just managed to reproduce the movement itself in the wrapper by increasing the lora strength to 1.8. It's greatly distorting the image, artifacts everywhere, and she's often lifting empty air instead of a shirt, but the movement is now consistently there every time. In native I'm just using a strength of 1.0 though. Even 0.8 is working fine there. Is it possible that native comfyui is somehow using different block weights for the hunyuan lora or some such? |
No, there's no such difference. Native version is different in many aspects as comfy implemented the model his own way, that's why we can't compare same seed 1:1 either. I'm not sure about the block swap effect on LoRa as I rarely use that myself, have you tried the new auto CPU offloading instead? There of course may be things happening with low VRAM that I've not been able to experience myself, native comfy does use LoRAs differently in low VRAM mode too. |
I see. Sadly I can't seem to make it work properly no matter what I try. CPU offloading instead of block swapping doesn't seem to change anything. Using block swapping but changing the amount of blocks swapped does not appear to affect the final result either. If block swapping or cpuload is what's killing the lora then it seems to be a binary effect rather than one based on which blocks are being swapped. I'm also trying to run it without either, but that's quite a challenge since it looks like it's circumventing even comfyui's normal memory management and so fills up vram with the model. I'm forced to run it at very low res and frames to finish in acceptable time, and with those settings I can't really tell if it's working or not... The resolution's making the entire image unreliable. |
I only really have experience with the early LoRAs, been weeks without PC now due to hardware failure, but the Arcane LoRAs I mostly tested with worked perfectly for me. |
I tried that one just now. It worked instantly in native but failed to apply any style in the wrapper, just giving a photorealistic video instead. However, since that one applies a complete style and not just a particular movement, it was a bit easier telling if it was working even at low resolution, so I tried it with no offloading. From what I can tell, it's working. Poorly, since it's very noisy and and it's morphing a lot, and the style is not entirely consistent with native, but it might just be the low res (128x256) and frame count(13) doing it. The image is definitely stylized though, across two different seeds. So it looks like the conclusion has to be that both block swapping and cpu offloading at least partially kills the effect of loras. |
Think I might have found a solution. I noticed this as part of your recent commits, where it loads loras in the ModelLoader node:
Thought force patch weights sounded promising, and indeed it was. Forcing that one to true again seems to have re-enabled lora effects for both block swap and cpu offload. And forcing full load to true again seems to have improved it further( Possibly because it would otherwise just patch the weights currently loaded, which in a low memory environment would be < all?). I guess doing the force full load flag is gonna make vram usage spike for a few seconds while patching the weights until the swap/offload takes place later, but that's a fair price to pay for loras working. |
Interesting, I reverted it as it made zero difference for me and someone was indeed complaining about the memory use. I think this issue is about the comfy LoRa loading skipping applying the LoRa if it determines there's not enough memory, and either of the force args bypasses that. Can you test if you need both or if either of them is enough? |
PatchWeights alone was making it do something different, but it didn't look entirely proper. Only when I also activated FullLoad did it look right. I just now checked FullLoad alone, and that seemingly produces the same result as having both active, as far as Lora behavior is concerned. However, there's definitely something wrong here. The lora effect seems to disappear from one generation to the next. It's not just an unlucky seed or anything like that, since identical settings will work fine after a restart of comfyui or switching between block swap and cpu offload. I'm thinking maybe the forced model offloading is somehow messing up the lora-patched weight, making them revert to unpatched/non-lora. |
Well, with ForceFullLoad active, loras are working using either blockswap or cpuoffload, but not more than once. On the second generation it's lost the lora effect. At that point, the way to re-activate the lora is to re-run the HyVideoModelLoader node, e.g. by changing the lora weight by a tiny fraction. It's as if on subsequent runs it's using a cached version of the model weights that's from before the lora was patched in... main_device vs offload_device doesn't seem to matter, nor does cpuoffload vs blockswap. Was thinking maybe the sampler node's force_offload flag set to false would help, but doesn't look like it. Adding in the force_patch_weights=True doesn't seem to help with this one either, though I need to retest it. |
That's pretty weird as there shouldn't be anything unloading it, after the model loader the model isn't modified at all, just moved between devices... something still must be doing that if the effect disappears like that. |
Ah, looks like that part was my fault. I was passing through a node that had mm.UnloadAllModels in it, after the sampler node, which broke the full-load lora application for the next run. Granted, I would have still expected it to be able to (re-)load the full lora-patched weights even if a node or the user forces a full model unload, so there might be a bug here nonetheless. |
That explains it yes, and it is this way because this allows using torch.compile with LoRAs, which currently doesn't work with the native version, and you just experienced the downside. |
Alright! Well, removing the model unload node and keeping the forcefullload=true parameter, it now seems to work consistently with loras with both the block swap method and the cpu offload. The full load does saturate my vram for 15-20 seconds, but it's not really a problem here. As soon as it starts sampling it unloads everything again. I guess it could maybe be a node param, with the warning that not doing a full load will prevent loras from working correctly when doing blockswap/offload. |
I'm also had a same issue. But i see kijai suggestion about new cpu offload i test 3 time is look like lora is work just fine no need to reload model. Maybe block swap just happen to crash with lora...but why it work in first render and just shut it self off after that? |
Sure you're not using any unloading nodes or such? They would unload LoRa and unless the model loader is ran again, they'd stay unloaded. Such nodes are completely unnecessary with these wrapper nodes, as they already include force offload options that won't interfere with the loaded Loras. |
same here , first times lora work ,secound times lora no work ,need adjust lora's strength a little will work again. |
Anything that does "unload_all_models" will cause this, and also is completely unnecessary. |
I having the same issue lora's seem to not load unless you have the forcefullload=true argument added in the code would be nice if this was a node parameter |
I'm making my own UI frontend for ComfyUI and hit this issue I think. At first I thought it's something with the way I submit the JSON to the API because loras seem to work once and then turn off. Then I reproduced it in ComfyUI itself, it's very easy and the workflow is the most basic one (from the examples).
It looks like a caching issue of sorts. If I change the lora weight, the graph re-executes from the beginning and the output is correct. However, if the model loading node is not executed due to caching, lora evaporates for some reason. I suppose the I have to note, this issue does not exist in another workflow that uses the stock ComfyUI nodes (a big mess of SD3/Flux conditioners and simple model loaders). However, those nodes don't support Enhance-A-Video and other useful tricks. |
I'm facing the same problem and would like to add more information to my tests yesterday. It does seem that the problem is in comfyUI... everything is very strange to me, simply after some configurations for a next generation the lora no longer works. I tested the same thing in the wrapper, after I tested it in comfyUI and the same thing happened, I don't understand. I'm using an RTX 3090, 64gb of ram and everything perfectly updated. |
The only way I've seen to generate with lora consistently and without losing "lora" is to clear the caches and unload everything. The problem is that you need to do this every time you generate again, I'm going to open an issue in the comfyui repository, this is important for publicizing the problem. |
I'm not sure it's a ComfyUI issue as it only happens with Kijai's nodes. As I said, if you use only the stock nodes and stock LoraLoaderModelOnly it works fine every time. |
And you don't have ANY extra nodes in the workflow? I can't reproduce this with my examples at all, any unload_all_models call would remove the LoRA and there are other nodes that use those for whatever reason. |
Here's the workflow I use, it only has your nodes: The first run: HunyuanVideo_1024_00001.mp4The second run (added two more periods in the prompt): HunyuanVideo_1024_00002.mp4The change is very funny tbh, almost intentional. |
As I mentioned, this behavior is unusual because I can reproduce it even with the ComfyUI nodes. Additionally, if the LoRA is not entirely removed during the second generation—for instance, due to modifications in the prompt—the LoRA's effect gradually diminishes. Eventually, it reaches a point where it has no impact at all. |
You mean even with the native nodes? That sounds bizarre... issues like that would transfer to the wrapper too possibly as I'm using ComfyUI LoRA loading too. |
I am also experiencing this, but for me the lora has no effect at all even from the first gen. The lora works very well with comfy native nodes, but Id like to use the enhance a video node. Will try the code change tonight. |
I was able to both fix and recreate the issue.
|
I trained a character lora using diffusion pipe, and found in the default hunyuan video wraper workflow:
the offload device gives a better lora effect, while the main device deviates significantly from the lora training set.
I'm very confused. Does anyone know why
The text was updated successfully, but these errors were encountered: