Stable diffusion mlx #474

pranav4501 · 2024-11-20T04:32:25Z

Sharded stable diffusion inference for mlx
#159

Changes:

Sharded Stable Diffusion 2-1 Base mlx
Handled diffusion steps by looping the whole model
Added back inference state
Modified grpc and proto to support inference
New endpoint for image generation
Streaming progress for image generation
Handling multiple submodels in a single model

Sharding process:

Stable Diffusion contains three models : CLIP( text encoder) , UNET( Denoising Transformer) and VAE (Image encoder and decoder)
Stable diffusion hugging face repo contains a model_index.json and a folder for each model with its config. I combined all the models configs and loaded it to the model.
The shard is then divided into 3 shards of each model (clip, unet, vae). This works something like the whole model is 37 layers of which 22, 8, 7 are the number of layers for each model in that order. So, a shard of (0,27) is made of shard(0,22, 'clip'), shard(0,5,'unet'), shard(0,0,'vae')
Each model is manually sharded into individual layers.
Then, the inference pipeline is clip.encode(text) -> unet.denoise_latent() for 50 steps -> vae.decode_image().
This is implemented as clip.encode(text) if step ==1 -> unet.denoise_latent() for a step -k -> vae.decode_image() if step ==50 . This pipeline is implemented for 50 steps while maintaining the intermediate results and step count in the inference_state

AlexCheema · 2024-11-21T08:50:37Z

Just so I understand what's going on, how is the model split up between devices? Lets say I have 3 devices with different capabilities, how does that work?

pranav4501 · 2024-11-21T14:19:17Z

There are no changes to that part. It's how the partition algorithm splits the shards across the devices.

AlexCheema · 2024-11-22T06:54:24Z

There are no changes to that part. It's how the partition algorithm splits the shards across the devices.

I see. The difference here is the layers are non-uniform. That means they won't necessarily get split proportional to the memory used right?

pranav4501 · 2024-11-22T15:16:17Z

Yeah, layers are non-uniform, so the split memory isn't exactly proportional to the number of layers. Can we split using the number of params?

AlexCheema · 2024-11-22T15:23:20Z

Yeah, layers are non-uniform, so the split memory isn't exactly proportional to the number of layers. Can we split using the number of params?

This is probably fine as long as the layers aren't wildly different in size. Do you know roughly how different in size they are?

pranav4501 · 2024-11-22T20:38:53Z

Unet does have couple larger layers because of upsampled dims and clip text encoder has comparatively smaller layers as it can be easily split similar to llms, made of transformer blocks. We can combine 2 clip layers and split UNET further to make it more uniform.
CLIP ( 1.36GB -> 22 layers : uniformly split), UNET ( 3.46GB -> 10 layers: non-uniform), VAE (346 MB -> 10 layers : non-uniform )

blindcrone · 2024-11-23T04:06:17Z

I think at some point it would make sense to allow more granular sharding of models than just transformer blocks anyway, and this could involve updating to a memory-footprint heuristic based on dtypes and parameters rather than assuming uniform layer blocks

pranav4501 and others added 5 commits November 19, 2024 23:13

Stable stable diffusion mlx

6b28ef0

Merge

3d5746f

static images dir

38ee815

gitignore tinychat pngs

fece3f0

Image streaming while generation

4874295

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable diffusion mlx #474

Stable diffusion mlx #474

pranav4501 commented Nov 20, 2024 •

edited

Loading

AlexCheema commented Nov 21, 2024

pranav4501 commented Nov 21, 2024

AlexCheema commented Nov 22, 2024

pranav4501 commented Nov 22, 2024

AlexCheema commented Nov 22, 2024

pranav4501 commented Nov 22, 2024

blindcrone commented Nov 23, 2024

Stable diffusion mlx #474

Are you sure you want to change the base?

Stable diffusion mlx #474

Conversation

pranav4501 commented Nov 20, 2024 • edited Loading

AlexCheema commented Nov 21, 2024

pranav4501 commented Nov 21, 2024

AlexCheema commented Nov 22, 2024

pranav4501 commented Nov 22, 2024

AlexCheema commented Nov 22, 2024

pranav4501 commented Nov 22, 2024

blindcrone commented Nov 23, 2024

pranav4501 commented Nov 20, 2024 •

edited

Loading