Skip to content

joliGEN v4.0.0

Latest
Compare
Choose a tag to compare
@beniz beniz released this 19 Dec 09:28
· 8 commits to master since this release

This main version adds many improvements as well as video generation with diffusion and super resolution with supervised metrics, including for consistency models.

Features

  • adding separate control of vertical and horizontal flips as augmentation (a7a6109)
  • aligned crops for super-resolution (8418470)
  • allow tf32 on cudnn (367cd91)
  • better Canny for cond image with background (c3c7de6)
  • consistency models with supervised losses (ed701ad)
  • data: random bbox for inpainting (764646d)
  • input and output multiple and different channels (6bcd64c)
  • load models without stricness (073d57c)
  • max number of visualized images from train/test set (24f0e81)
  • ml: add option for vid inference (c3f83b7)
  • ml: add supervised loss with GANs with aligned datasets (d7f5119)
  • ml: added LPIPS supervised loss with GANs (70e8ee4)
  • ml: adding example of CM+discriminator (b6b8b64)
  • ml: batched prompts for turbo (023dd54)
  • ml: Canny can use a range of dropout probabilities (7b4c860)
  • ml: canny dropout for vid (06ce7d7)
  • ml: CM with added discriminator (10516e0)
  • ml: consistency models for pix2pix (cd92712)
  • ml: CUT turbo (cdd508f)
  • ml: debug args (a11172b)
  • ml: debug crop (51c9fd6)
  • ml: debug for canny inference (930f3ce)
  • ml: debug for canny threshold (dca0bfa)
  • ml: debug for vid metrics (7c57471)
  • ml: debug inference_vid for canny (17b9a29)
  • ml: debug vid for frame limit (ff97c03)
  • ml: debug vid metric (ba43725)
  • ml: DISTS supervised loss for aligned data (56273ef)
  • ml: FID,KID,MSID for multiple test sets and non 8 bit images (74b0e65)
  • ml: fix canny range option (c102ee0)
  • ml: fix inference regeneration and crop canny (f75196f)
  • ml: HDiT for GANs (58bedff)
  • ml: HDiT generator (9a95f1f)
  • ml: jenkins test inference print (b68ab53)
  • ml: L1 or MSE for diffusion multiscale loss (06e3d6a)
  • ml: metric fvd for video (6d458a3)
  • ml: min-SNR loss weight for diffusion, 2303.09556 (c802119)
  • ml: modif for horse2zebra prompt (b66a954)
  • ml: multiple test sets (6db745c)
  • ml: option for max_sequence_lenght of video generation (12cfc1b)
  • ml: prompt for inference horze2zebra (b8e9929)
  • ml: random canny inside batch (70919cd)
  • ml: rename dataloader for video generation (98b1315)
  • ml: The implementation of UNetVid for generating video with temporal consistency and inference (43b7018)
  • ml: unchange fill_img_with_canny with random drop canny (a2ed3fc)
  • ml: UNetVid for generating video with bs > 1 (00f11bc)
  • ml: vid try autoregressive inference (5b92031)
  • multi-prompt local works (b98746a)
  • multiprompt (2bffc8b)
  • multistep lr scheduler (01c3558)
  • train_finetune for finetuning gans/others and removing / adding losses and networks (2f26503)
  • unet_vid motion module fine-grained configuration (813e435)

Bug Fixes

  • aligne dataset, resize domain A only if necessary (4127571)
  • allowing for no NCE with cut (9d8ff9b)
  • clamp bbox to image size during inference (fc3874d)
  • cm at test time (706356b)
  • cm with conditioning (0fd2d14)
  • consistency model schedule upon resume (88d03f9)
  • consistency models with input/output different channels (db61821)
  • crash in inference script, errors in documentation (f99dd34)
  • cut options at test time (dcd2438)
  • D input is G output size with gans (194f42b)
  • diff across input/output channels in gans (6845816)
  • diff real/fake not needed + cleanup (5cbd1f0)
  • diffusion inference for images > 8bits (aefdc38)
  • diffusion with input and output of different channel size (cd264de)
  • disable hdit flop count (8c449f8)
  • fix pytest rootdir (1fe0e80)
  • further lowering the input test size of cut-turbo (6914731)
  • gan inference script with prompts (cef7681)
  • gan metrics reference (d5570b6)
  • GAN semantic visual output (d3a5565)
  • GAN semantic visual output (e7ee6bd)
  • gen_single_image.py for images with channels > 3 (9ad4aaa)
  • hdit out_channel (84473fc)
  • identity with cut turbo (2538c00)
  • inference with images > 8bit and GANs (34e6c96)
  • input size of cut-turbo test (2c024c2)
  • interpolation size selection for projected discriminators (ef045d0)
  • load_image replacement (5af5803)
  • loading of ema models (995c5eb)
  • lora config saving with multiple gpus (c98617d)
  • lower img2img turbo test memory footprint (54a6ab4)
  • missing SSIM metric option (8530851)
  • ml: multiscale diffusion loss for any input resolution (5c9f997)
  • multi-gpu ddp collective mismatch upon resume (471fbbc)
  • multi-gpu with frozen base network (1a07342)
  • multiple test sets with test.py + SSIM (06762fb)
  • option default cut_nce_idt (4c5ec6d)
  • palette options at test time (75f7b04)
  • parser uses model_type for model level options (76095b5)
  • paths are only required for video generation (eb39ec5)
  • paths loading prompts file (35d2ef3)
  • perceptual loss for cm when input and output channels differ (ca81789)
  • potential bug in gen_single_diffusion model path (0cf63fe)
  • projected discriminator allows grayscale input (44fb458)
  • prompt unaligned loading (e25d4b1)
  • rename sketch options in examples (6930d00)
  • RGB order for diffusion inpainting (eff8a57)
  • rgbn cut lpips supervision (17cfbb2)
  • sam for single channel inputs (397f837)
  • segformer generator for single channel inputs (1eb6695)
  • show full test set output with GANs (31efdcd)
  • single dataset (a6266d8)
  • supervised loss for aligned GANs, with unit tests (e21ddd3)
  • supervised perceptual metrics all with piq and configurable + lambda weight (d77c3c5)
  • test image output tensor visuals (19596b2)
  • tifffile import (a09b5ed)
  • total_iters wrong variable (066dc1b)
  • train batch visuals (24adb61)
  • typo in semantic threshold test variable (5082c36)
  • unet mha output for GANs (075b6c6)

Docker images: