Implementation of Stable Diffusion with Aesthetic Gradients #2585

MalumaDev · 2022-10-14T09:00:22Z

here the original repo: https://github.com/vicgalle/stable-diffusion-aesthetic-gradients

TingTingin · 2022-10-14T10:30:32Z

someone is working on this #2498 should probably review and see whats different

ShadowPower · 2022-10-14T17:33:01Z

File "D:\stable-diffusion-webui-aesthetic\modules\sd_hijack.py", line 411, in forward
    z = z * (1 - self.aesthetic_weight) + zn * self.aesthetic_weight
RuntimeError: The size of tensor a (154) must match the size of tensor b (77) at non-singleton dimension 1

It seems that the token length is limited by the CLIP model.

EliEron · 2022-10-14T20:33:56Z

This seems to work well, but the default values are a bit odd.

The repo recommends an aesthetic learning rate of 0.0001, but you default to 0.005 which is an order of magnitude higher. Is there a specific reason for this?

Similarly for aesthetic steps the repo recommends starting with relatively small step amounts, but the default in this PR is the highest value that the UI is set to allow.

MalumaDev · 2022-10-14T21:02:13Z

To be quick, I put "random" default values 😅
I fixed the problem of the token length and I added the UI for the generation of the embedding. I need some hours of sleep, tomorrow I'll commit the code

… the embedding before the generation using text

vicgalle

Thanks for adapting this, @MalumaDev! Looks good to me.
I only add some suggestions regarding the name of some parameters and the max value of one.

README.md

modules/aesthetic_clip.py

modules/ui.py

modules/aesthetic_clip.py

Co-authored-by: Víctor Gallego <[email protected]>

bmaltais · 2022-10-15T16:58:13Z

This feature is actually way more interesting than I thought. Pretty amazing the variations you can obtain using the images embeddings. I am still trying to figure out how to use all the different sliders and what they do... I really hope this will get merged someday.

I notices creating a new image embedding does not automatically get added to the pull down in text2img. Just a nit pick.

bmaltais · 2022-10-15T17:11:27Z

Quick example for those wondering. I created an image embedding from a bunch of big eyes paintings and tried to apply it to the simple "a beautiful woman" seed 0 prompt. Here are the results:

Original prompt image:

Applying the image embedding style with aesthetic: learning rate 0.001, weight 0.85 and steps 40:

Increasing the weight to 1 increasing the style application resulting in something closer to the original paintings:

Bringing it down to 0.5 will obviously reduce the effect:

And the beauty is that it requires almost no computing time. This is next level stuff... Magic!!!

bmaltais · 2022-10-15T17:34:03Z

Another example using the same prompt as above. I created an image embedding from a bunch of images at: https://lexica.art/?q=aadb4a24-2469-47d8-9497-cafc1f513071

After some fine tuning of the weights and learning rate I was able to get:

And from those https://lexica.art/?q=1f5ef1e0-9f3a-48b8-9062-d9120ba09274 I got:

And all this with literally no training what so ever. AMAZING!

MalumaDev · 2022-10-15T17:59:38Z

This feature is actually way more interesting than I thought. Pretty amazing the variations you can obtain using the images embeddings. I am still trying to figure out how to use all the different sliders and what they do... I really hope this will get merged someday.

I notices creating a new image embedding does not automatically get added to the pull down in text2img. Just a nit pick.

Little bug. I'll fix it.

bmaltais · 2022-10-15T18:03:21Z

I even tried feeding it 19 pictures of me in a non 1:1 aspect ratio (512x640) and gosh darn... if produced passable results!

Sample input image:

Prompt with no Aesthetic applied:

Aesthetic applied:

Not as good as if I trained Dreambooth or TI but for a 1-minute fiddling it is amazing. It appears to apply the overall pose of some of the pictures I fed it. I wonder what would happen if I fed the thing with 100= photos of me in varying size... It is as if the size and ratio of images you feed it does not matter.

And what is amazing is that it does all this with a 4KB file!

…t_resolve_conflicts

feffy380 · 2022-10-15T23:18:52Z

I'd suggest hiding the interface behind the Extra checkbox or at least moving it lower. It's quite large and pushes more commonly used options like CFG and Batch size/count off-screen.

bmaltais · 2022-10-16T00:14:40Z

I'd suggest hiding the interface behind the Extra checkbox or at least moving it lower. It's quite large and pushes more commonly used options like CFG and Batch size/count off-screen.

Indeed. I doubt Automatic will like it where it is now... best would be some sort of tabs inside the parameter section to present the current options in a default tab and access the aesthetic options in an aesthetic tab beside it.

MalumaDev · 2022-10-16T11:14:48Z

An additional thing I'm going to ask of you is to isolate as much of your code into separate files as possible. The big chunk of code in sd_hijack should be in its own file. All the parameters of aesthetic gradients should be in members of your own class defined in your own file, not in sd_hijack.

WIP!!

MalumaDev · 2022-10-16T15:56:50Z

On a separate note... do you think the same thing could be added to img2img to offer better conformity to the original image? I sometime feel the aesthetic model is difficult to control. A some point it totally change the original image instead of changing the overall style of it. If it was possible to control the weight of the aesthetic on top of the resulting prompt image without it without losing the whole look it would be even better.

Added

bmaltais · 2022-10-16T20:01:23Z

I like the now expandable section for the aesthetic section. This is a step in the right direction and I hope Automatic will approve of it.

I tested the img2img implementation and it work very well. I was able to keep the general composition of the ofiginal and transform it toward the aesthetic without losing too much... NICE. Here is an example of applying the Big Eyes style to a man photo:

Original:

Styled with big eyes:

and the overall config:

Trying to apply the same aesthetic on the source text2img with same seed would result in this... which is not what I want:

I think the better workflow is:

Use text2img to get a good starting image (or just use an external image as a source)
send it to img2img
apply the aesthetic changes there and tweak to taste

bmaltais · 2022-10-16T20:26:46Z

Something else I noticed. Is there a reason the Aesthetic optimization is always computed? If no parameters for it have changed from generation to generation, could it not just be used from memory cache instead of always being recomputed?

MalumaDev · 2022-10-17T07:38:18Z

Something else I noticed. Is there a reason the Aesthetic optimization is always computed? If no parameters for it have changed from generation to generation, could it not just be used from memory cache instead of always being recomputed?

When the seed changes so does the training result!!!

feffy380 · 2022-10-17T10:21:53Z

@bmaltais Looking at the original aesthetic gradients repo, the personalization step involves performing gradient descent to make the prompt embedding more similar to the aesthetic embedding. In other words, it has to be recomputed for each prompt. ~~But it shouldn't be affected by the seed as far as I can tell.~~ Actually, isn't the process nondeterministic regardless of seed unless you enable determinism in pytorch itself? Can someone test if running the same settings twice produces the same image?

miaw24 · 2022-10-20T04:06:51Z

I think there should be an option to do the Aesthetic optimization on cpu, before sending it back to the gpu for the image generation process. This might be useful for people with limited vram, so that they won't run out of vram when computing the Aesthetic optimization

bbecausereasonss · 2022-10-21T15:36:08Z

Is there a tutorial on how to set this up/train it?

bmaltais · 2022-10-21T16:39:44Z

Have a look over here: Using Aesthetic Images Embeddings to improve Dreambooth or TI results · Discussion #3350 · AUTOMATIC1111/stable-diffusion-webui (github.com) <#3350>

…

On Fri, Oct 21, 2022 at 11:36 AM becausereasons ***@***.***> wrote: Is there a tutorial on how to set this up/train it? — Reply to this email directly, view it on GitHub <#2585 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZA34T4P2W7UYRGZ3DCM7TWEKZ7HANCNFSM6AAAAAARFBBXIE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

rabidcopy · 2022-10-21T22:07:00Z

So is there any hope to do this on 4GB of VRAM? My poor card has been able to handle everything(besides training) up to 576x576 so far with --medvram, VAEs, hypernetworks, upscalers, etc, but this puts me OOM after the first pass. 😅

TinyBeeman · 2022-10-22T00:54:51Z

It seems like "Aesthetic text for imgs" and slerp angle are somehow off... Values between 0.001 and 0.02 seem to cause the aesthetic text to influence the embedding in a meaningful way. But 0.2 to 1.0 seem random and not to have that much effect relative to each other. If I use "colorful painting", for instance (0.0 = ignore text, 0.001 = it adds color and flowers, 0.2 to 1.0 = the image seems to lose style altogther, and is neither colorful nor painterly.

MalumaDev · 2022-10-22T05:54:02Z

The Dalle2 paper specifies that the max angle to use is in between [0.25,0.5]. (TextDiff)

TinyBeeman · 2022-10-22T06:04:34Z

@MalumaDev that makes sense, maybe we should adjust the slider range to be more helpful. That said, as currently implemented, ranges as low as .001 have interesting variations, and ranges above .25 seem to be… uninteresting. At least in my test cases.

miaw24 · 2022-10-22T12:43:02Z

@rabidcopy i am able to use it on 4GB of VRAM by editing aesthetic_clip.py where i changed every single device to 'cpu' (except the import part, of course), and then to prevent it from complaining that the tensors are in two different devices, editing the code in __call__ function of class AestheticCLIP, adding z = z.to('cpu') before if self.slerp: part, and also adding z = z.to(device) before the return z part. So far, this works (or at least it works for me), but idk whether using cpu to compute the aesthetic gradient will change the result if compared to the result produced while it is computed with cuda.

cornpo · 2022-10-23T04:20:36Z

~~I couldn't run the laion_7plus or sac_8plus since the original pr. Now tonight I can.~~

Gloom, Watercolor, et al work fine. Then on laion_7 or sac_8 IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) but only with text in the aesthetic embeddings text box.

hulululu7654321 · 2022-10-27T13:11:03Z

File "/home/hulululu/desktop/stable-diffusion-webui-master/extensions/aesthetic-gradients/aesthetic_clip.py", line 233, in call
sim = text_embs @ img_embs.T
RuntimeError: expected scalar type Float but found Half

how can i deal with this problem?

baphilia · 2022-10-28T12:14:15Z

The Dalle2 paper specifies that the max angle to use is in between [0.25,0.5]. (TextDiff)

It seems like "Aesthetic text for imgs" and slerp angle are somehow off... Values between 0.001 and 0.02 seem to cause the aesthetic text to influence the embedding in a meaningful way. But 0.2 to 1.0 seem random and not to have that much effect relative to each other. If I use "colorful painting", for instance (0.0 = ignore text, 0.001 = it adds color and flowers, 0.2 to 1.0 = the image seems to lose style altogther, and is neither colorful nor painterly.

anyone have a link or a quick explanation of what the 'aesthetic text for imgs', 'slerp angle', and 'slerp interpolation' are supposed to do? what should I be typing there? what is the desired effect? (I tried searching the paper and a few articles and readme's for the relevant terms, but I failed to find anything)

on low settings for angle it seems to be super random, just changing the entire subject of the image to something that has nothing to do with either the regular prompt or the aesthetic text, and at high settings it just seems to use the aesthetic text as a new prompt (without incorporating the styling of the embedding at all)

jonwong666 · 2022-11-01T20:28:35Z

Aesthetic works best with TXT2IMG, its not for IMG2IMG

Im getting good results with these settings

the trick is to not use too many style or conflicting artists in the main prompt and let aesthetics do the work with a high learning rate.

gsgoldma · 2022-11-08T01:36:56Z

@rabidcopy i am able to use it on 4GB of VRAM by editing aesthetic_clip.py where i changed every single device to 'cpu' (except the import part, of course), and then to prevent it from complaining that the tensors are in two different devices, editing the code in __call__ function of class AestheticCLIP, adding z = z.to('cpu') before if self.slerp: part, and also adding z = z.to(device) before the return z part. So far, this works (or at least it works for me), but idk whether using cpu to compute the aesthetic gradient will change the result if compared to the result produced while it is computed with cuda.

I might be a fool, but which indentations did you use?

shamblessed

Hydd

Block LoRA

init

bb57f30

MalumaDev requested a review from AUTOMATIC1111 as a code owner October 14, 2022 09:00

MartinCairnsSQL mentioned this pull request Oct 14, 2022

Initial Commit for POC for Aesthetic Gradients #2498

Closed

MalumaDev added 2 commits October 15, 2022 15:59

fix to tokens lenght, addend embs generator, add new features to edit…

37d7ffb

… the embedding before the generation using text

Merge branch 'master' into test_resolve_conflicts

7b7561f

vicgalle reviewed Oct 15, 2022

View reviewed changes

EliEron reviewed Oct 15, 2022

View reviewed changes

modules/aesthetic_clip.py Outdated Show resolved Hide resolved

MalumaDev and others added 6 commits October 15, 2022 18:39

Update modules/ui.py

4387e4f

Co-authored-by: Víctor Gallego <[email protected]>

Update README.md

f7df06a

Co-authored-by: Víctor Gallego <[email protected]>

Update modules/aesthetic_clip.py

9b7705e

Co-authored-by: Víctor Gallego <[email protected]>

Update modules/ui.py

0d4f5db

Co-authored-by: Víctor Gallego <[email protected]>

Update modules/ui.py

ad9bc60

Co-authored-by: Víctor Gallego <[email protected]>

Update modules/ui.py

3f5c3b9

Co-authored-by: Víctor Gallego <[email protected]>

Add support to other img format, fixed dropbox update

3d21684

MalumaDev changed the title ~~Implementation of Stable Diffusion with Aesthetic Gradients + Batch size and gradient accumulation for training~~ Implementation of Stable Diffusion with Aesthetic Gradients ~~+ Batch size and gradient accumulation for training~~ Oct 15, 2022

MalumaDev changed the title ~~Implementation of Stable Diffusion with Aesthetic Gradients ~~+ Batch size and gradient accumulation for training~~~~ Implementation of Stable Diffusion with Aesthetic Gradients Oct 15, 2022

MalumaDev added 3 commits October 16, 2022 00:06

Merge branch 'master' into test_resolve_conflicts

97ceaa2

fixed dropbox update

9325c85

Merge remote-tracking branch 'origin/test_resolve_conflicts' into tes…

b694bba

…t_resolve_conflicts

MalumaDev added 2 commits October 16, 2022 10:23

ui fix

523140d

ui fix

e4f8b5f

MalumaDev added 2 commits October 16, 2022 17:53

ui fix, re organization of the code

9324cda

Merge branch 'master' into test_resolve_conflicts

ae0fdad

Merge branch 'master' into test_resolve_conflicts

589215d

MalumaDev added 3 commits October 18, 2022 08:55

Merge branch 'master' into test_resolve_conflicts

1997ccf

Merge branch 'master' into test_resolve_conflicts

c2765c9

Merge branch 'master' into test_resolve_conflicts

2362d5f

AUTOMATIC1111 merged commit 7d6b388 into AUTOMATIC1111:master Oct 21, 2022

shamblessed approved these changes Sep 9, 2023

View reviewed changes

DrakeRichards pushed a commit to DrakeRichards/stable-diffusion-webui that referenced this pull request Dec 20, 2023

Merge pull request AUTOMATIC1111#2585 from AI-Casanova/block-LoRA

27bd7c4

Block LoRA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Stable Diffusion with Aesthetic Gradients #2585

Implementation of Stable Diffusion with Aesthetic Gradients #2585

MalumaDev commented Oct 14, 2022

TingTingin commented Oct 14, 2022

ShadowPower commented Oct 14, 2022

EliEron commented Oct 14, 2022 •

edited

Loading

MalumaDev commented Oct 14, 2022

vicgalle left a comment

bmaltais commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

MalumaDev commented Oct 15, 2022

bmaltais commented Oct 15, 2022 •

edited

Loading

feffy380 commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 16, 2022

MalumaDev commented Oct 16, 2022

MalumaDev commented Oct 16, 2022 •

edited

Loading

bmaltais commented Oct 16, 2022 •

edited

Loading

bmaltais commented Oct 16, 2022

MalumaDev commented Oct 17, 2022

feffy380 commented Oct 17, 2022 •

edited

Loading

miaw24 commented Oct 20, 2022

bbecausereasonss commented Oct 21, 2022

bmaltais commented Oct 21, 2022 via email

rabidcopy commented Oct 21, 2022

TinyBeeman commented Oct 22, 2022 •

edited

Loading

MalumaDev commented Oct 22, 2022 •

edited

Loading

TinyBeeman commented Oct 22, 2022 •

edited

Loading

miaw24 commented Oct 22, 2022 •

edited

Loading

cornpo commented Oct 23, 2022 •

edited

Loading

hulululu7654321 commented Oct 27, 2022

baphilia commented Oct 28, 2022 •

edited

Loading

jonwong666 commented Nov 1, 2022

gsgoldma commented Nov 8, 2022 •

edited

Loading

shamblessed left a comment

Implementation of Stable Diffusion with Aesthetic Gradients #2585

Implementation of Stable Diffusion with Aesthetic Gradients #2585

Conversation

MalumaDev commented Oct 14, 2022

TingTingin commented Oct 14, 2022

ShadowPower commented Oct 14, 2022

EliEron commented Oct 14, 2022 • edited Loading

MalumaDev commented Oct 14, 2022

vicgalle left a comment

Choose a reason for hiding this comment

bmaltais commented Oct 15, 2022 • edited Loading

bmaltais commented Oct 15, 2022 • edited Loading

bmaltais commented Oct 15, 2022 • edited Loading

MalumaDev commented Oct 15, 2022

bmaltais commented Oct 15, 2022 • edited Loading

feffy380 commented Oct 15, 2022 • edited Loading

bmaltais commented Oct 16, 2022

MalumaDev commented Oct 16, 2022

MalumaDev commented Oct 16, 2022 • edited Loading

bmaltais commented Oct 16, 2022 • edited Loading

bmaltais commented Oct 16, 2022

MalumaDev commented Oct 17, 2022

feffy380 commented Oct 17, 2022 • edited Loading

miaw24 commented Oct 20, 2022

bbecausereasonss commented Oct 21, 2022

bmaltais commented Oct 21, 2022 via email

rabidcopy commented Oct 21, 2022

TinyBeeman commented Oct 22, 2022 • edited Loading

MalumaDev commented Oct 22, 2022 • edited Loading

TinyBeeman commented Oct 22, 2022 • edited Loading

miaw24 commented Oct 22, 2022 • edited Loading

cornpo commented Oct 23, 2022 • edited Loading

hulululu7654321 commented Oct 27, 2022

baphilia commented Oct 28, 2022 • edited Loading

jonwong666 commented Nov 1, 2022

gsgoldma commented Nov 8, 2022 • edited Loading

shamblessed left a comment

Choose a reason for hiding this comment

EliEron commented Oct 14, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

bmaltais commented Oct 15, 2022 •

edited

Loading

feffy380 commented Oct 15, 2022 •

edited

Loading

MalumaDev commented Oct 16, 2022 •

edited

Loading

bmaltais commented Oct 16, 2022 •

edited

Loading

feffy380 commented Oct 17, 2022 •

edited

Loading

TinyBeeman commented Oct 22, 2022 •

edited

Loading

MalumaDev commented Oct 22, 2022 •

edited

Loading

TinyBeeman commented Oct 22, 2022 •

edited

Loading

miaw24 commented Oct 22, 2022 •

edited

Loading

cornpo commented Oct 23, 2022 •

edited

Loading

baphilia commented Oct 28, 2022 •

edited

Loading

gsgoldma commented Nov 8, 2022 •

edited

Loading