Skip to content

ArulselvanMadhavan/diffusers-ocaml

Repository files navigation

Diffusers-OCaml: A Diffusers API in OCaml

This work is a port of the diffusers-rs library written in Rust.

Stable-Diffusion

How to run?

  • Download weights and save it data directory. For instructions on how to do this, refer diffusers-rs-docs
  • Once you have “.ot” files, you are ready to run stable diffusion
  • I have found that placing unet and clip on GPU, and placing vae on cpu works the best for my GPU. Here is the command that does that.
    dune exec stable_diffusion -- generate "lighthouse at dark" "vae" "data/pytorch_model.ot" "data/vae.ot" "data/unet.ot"
        
  • To run all models in CPU
    dune exec stable_diffusion -- generate "lighthouse at dark" "all" "data/pytorch_model.ot" "data/vae.ot" "data/unet.ot"
        
  • To generate more than 1 sample, use the num_samples parameter
    dune exec stable_diffusion -- generate "lighthouse at dark" "all" "data/pytorch_model.ot" "data/vae.ot" "data/unet.ot" --num_samples=2
        

Sample generated images

./media/lighthouse.png

./media/sd_final.2.png

Performance on 8GB Nvidia GeForce GTX 1070 Mobile GPU

It takes about 27 seconds to generate an image. Measurements were done on a 12 CPU Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz. Running all the steps in CPU takes little more than three minutes. I place vae on CPU; unet and clip on GPU

real    0m25.110s
user    0m39.271s
sys     0m11.518s

Image to Image generation

How to run?

dune exec img2img -- img2img media/in_img2img.jpg

Sample generated image

  • Input image

./media/in_img2img.jpg

  • Output image

./media/out_img2img.png

Performance on 8GB Nvidia GeForce GTX 1070 Mobile GPU

I placed vae on CPU; unet and clip on GPU

real    0m15.628s
user    0m34.571s
sys     0m5.833s

Inpaint

How to run?

dune exec inpaint -- generate media/sd_input.png media/sd_mask.png --cpu="vae" --prompt="Face of a panda, high resolution, sitting on a park bench"

Sample generated image

  • Prompt: Face of a panda, high resolution, sitting on a park bench

./media/sd_input.png

./media/sd_mask.png

./media/panda.png

Performance

I placed vae on CPU; unet and clip on GPU

real    0m28.055s
user    0m51.600s
sys     0m13.851s

Note

Only FP32 weights are supported.