-
Notifications
You must be signed in to change notification settings - Fork 7
TCD Scheduler + LoRA IPAdapter SDXL #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r windows support.
| self.graph = None | ||
|
|
||
| def infer(self, feed_dict, stream, use_cuda_graph=False): | ||
| # Filter inputs to only those the engine actually exposes to avoid binding errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure about this
| diffusers==0.35.0 | ||
| transformers==4.56.0 | ||
| peft==0.18.0 | ||
| transformers==4.55.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These dep versions changed for Windows support.
|
|
||
| # Create prefix (from wrapper.py lines 1005-1013) | ||
| prefix = f"{base_name}--lcm_lora-{use_lcm_lora}--tiny_vae-{use_tiny_vae}--min_batch-{min_batch_size}--max_batch-{max_batch_size}" | ||
| prefix = f"{base_name}--tiny_vae-{use_tiny_vae}--min_batch-{min_batch_size}--max_batch-{max_batch_size}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause engines to rebuild - so it's easiest to remove lcm_lora-{use_lcm_lora}-- from any engines you've already built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty for the heads up on this
BuffMcBigHuge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the changes and identified a critical regression in src/streamdiffusion/wrapper.py that likely causes the reported quality degradation.
The logic for backwards compatibility of use_lcm_lora checks if lora_dict is not None. However, lora_dict defaults to None. As a result, the LCM LoRA is never loaded for default configurations, leading to the lack of detail and sharpness (as the model runs in LCM mode without the required LoRA).
I also noted a change in src/streamdiffusion/pipeline.py regarding init_noise updates that might affect temporal behavior.
| # DEPRECATED: THIS WILL LOAD LCM_LORA IF USE_LCM_LORA IS TRUE | ||
| # Validate backwards compatibility LCM LoRA selection using proper model detection | ||
| if hasattr(self, 'use_lcm_lora') and self.use_lcm_lora is not None: | ||
| if self.use_lcm_lora and not self.sd_turbo and lora_dict is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a logic bug here. lora_dict is None by default in __init__. If the user does not provide a lora_dict, this condition lora_dict is not None will be False, and the LCM LoRA will NOT be added to the dictionary (and thus not loaded).
This means users running in default mode (without explicit lora_dict) will run without the LCM LoRA, causing the "lack of detail and sharpness" degradation observed.
It should probably be:
if self.use_lcm_lora and not self.sd_turbo:
if lora_dict is None:
lora_dict = {}
# ...| # Build latent batch for CFG | ||
| if self.guidance_scale > 1.0 and cfg_mode == "full": | ||
| latent_with_uc = torch.cat([latent_model_input, latent_model_input], dim=0) | ||
| elif self.guidance_scale > 1.0 and cfg_mode == "initialize": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous implementation, self.init_noise = x_t_latent was set here. Its removal changes the behavior of init_noise for subsequent frames (it remains static/random instead of carrying over the noisy latent). Was this removal intentional? It might affect temporal coherence or noise patterns.
Overview
This pull request introduces comprehensive support for customizable schedulers and samplers in StreamDiffusion, enabling more flexible and efficient diffusion processes. It builds on the existing pipeline to allow users to specify schedulers (e.g., LCM) and samplers (e.g., normal) via configuration, improving generation quality, speed, and compatibility with advanced features like ControlNet and IPAdapter. Additional enhancements include better LoRA handling to resolve conflicts, TensorRT engine optimizations for robustness, a quiet mode for cleaner logging, and minor UI/dependency updates.
The implementation refactors the core wrapper and pipeline to integrate scheduler/sampler logic dynamically, supports Temporal Consistency Distillation (TCD) for ControlNet, and ensures backward compatibility with existing setups. This enables experimentation with different diffusion strategies without recompiling engines from scratch.
New Features
Scheduler and Sampler Integration:
scheduler: lcm,sampler: normal), affecting timestep scaling and noise prediction.Enhanced LoRA Handling:
--lora-{num}-{hash}), allowing multiple LoRAs without path conflicts or invalid filenames.use_lcm_lorain favor oflora_dictwhile remaining backwards compatible.ControlNet Temporal Consistency Distillation (TCD):
Quiet Mode for Uvicorn:
--quietflag to suppress INFO-level uvicorn logs (e.g., access logs), reducing noise during debugging and production runs.QUIET=True) or CLI, with logger adjustments in the realtime-img2img demo.TensorRT Inference Improvements:
Dependencies