-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Stable Diffusion with Aesthetic Gradients #2585
Implementation of Stable Diffusion with Aesthetic Gradients #2585
Conversation
someone is working on this #2498 should probably review and see whats different |
It seems that the token length is limited by the CLIP model. |
This seems to work well, but the default values are a bit odd. The repo recommends an aesthetic learning rate of 0.0001, but you default to 0.005 which is an order of magnitude higher. Is there a specific reason for this? Similarly for aesthetic steps the repo recommends starting with relatively small step amounts, but the default in this PR is the highest value that the UI is set to allow. |
To be quick, I put "random" default values 😅 |
… the embedding before the generation using text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adapting this, @MalumaDev! Looks good to me.
I only add some suggestions regarding the name of some parameters and the max value of one.
Co-authored-by: Víctor Gallego <[email protected]>
Co-authored-by: Víctor Gallego <[email protected]>
Co-authored-by: Víctor Gallego <[email protected]>
Co-authored-by: Víctor Gallego <[email protected]>
Co-authored-by: Víctor Gallego <[email protected]>
Co-authored-by: Víctor Gallego <[email protected]>
This feature is actually way more interesting than I thought. Pretty amazing the variations you can obtain using the images embeddings. I am still trying to figure out how to use all the different sliders and what they do... I really hope this will get merged someday. I notices creating a new image embedding does not automatically get added to the pull down in text2img. Just a nit pick. |
Another example using the same prompt as above. I created an image embedding from a bunch of images at: https://lexica.art/?q=aadb4a24-2469-47d8-9497-cafc1f513071 After some fine tuning of the weights and learning rate I was able to get: And from those https://lexica.art/?q=1f5ef1e0-9f3a-48b8-9062-d9120ba09274 I got: And all this with literally no training what so ever. AMAZING! |
Little bug. I'll fix it. |
I even tried feeding it 19 pictures of me in a non 1:1 aspect ratio (512x640) and gosh darn... if produced passable results! Sample input image: Prompt with no Aesthetic applied: Aesthetic applied: Not as good as if I trained Dreambooth or TI but for a 1-minute fiddling it is amazing. It appears to apply the overall pose of some of the pictures I fed it. I wonder what would happen if I fed the thing with 100= photos of me in varying size... It is as if the size and ratio of images you feed it does not matter. And what is amazing is that it does all this with a 4KB file! |
I'd suggest hiding the interface behind the Extra checkbox or at least moving it lower. It's quite large and pushes more commonly used options like CFG and Batch size/count off-screen. |
Indeed. I doubt Automatic will like it where it is now... best would be some sort of tabs inside the parameter section to present the current options in a default tab and access the aesthetic options in an aesthetic tab beside it. |
WIP!! |
Added |
I like the now expandable section for the aesthetic section. This is a step in the right direction and I hope Automatic will approve of it. I tested the img2img implementation and it work very well. I was able to keep the general composition of the ofiginal and transform it toward the aesthetic without losing too much... NICE. Here is an example of applying the Big Eyes style to a man photo: Original: Styled with big eyes: and the overall config: Trying to apply the same aesthetic on the source text2img with same seed would result in this... which is not what I want: I think the better workflow is:
|
Something else I noticed. Is there a reason the Aesthetic optimization is always computed? If no parameters for it have changed from generation to generation, could it not just be used from memory cache instead of always being recomputed? |
When the seed changes so does the training result!!! |
@bmaltais Looking at the original aesthetic gradients repo, the personalization step involves performing gradient descent to make the prompt embedding more similar to the aesthetic embedding. In other words, it has to be recomputed for each prompt. |
I think there should be an option to do the Aesthetic optimization on cpu, before sending it back to the gpu for the image generation process. This might be useful for people with limited vram, so that they won't run out of vram when computing the Aesthetic optimization |
Is there a tutorial on how to set this up/train it? |
Have a look over here: Using Aesthetic Images Embeddings to improve
Dreambooth or TI results · Discussion #3350 ·
AUTOMATIC1111/stable-diffusion-webui (github.com)
<#3350>
…On Fri, Oct 21, 2022 at 11:36 AM becausereasons ***@***.***> wrote:
Is there a tutorial on how to set this up/train it?
—
Reply to this email directly, view it on GitHub
<#2585 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZA34T4P2W7UYRGZ3DCM7TWEKZ7HANCNFSM6AAAAAARFBBXIE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
So is there any hope to do this on 4GB of VRAM? My poor card has been able to handle everything(besides training) up to 576x576 so far with --medvram, VAEs, hypernetworks, upscalers, etc, but this puts me OOM after the first pass. 😅 |
It seems like "Aesthetic text for imgs" and slerp angle are somehow off... Values between 0.001 and 0.02 seem to cause the aesthetic text to influence the embedding in a meaningful way. But 0.2 to 1.0 seem random and not to have that much effect relative to each other. If I use "colorful painting", for instance (0.0 = ignore text, 0.001 = it adds color and flowers, 0.2 to 1.0 = the image seems to lose style altogther, and is neither colorful nor painterly. |
The Dalle2 paper specifies that the max angle to use is in between [0.25,0.5]. (TextDiff) |
@MalumaDev that makes sense, maybe we should adjust the slider range to be more helpful. That said, as currently implemented, ranges as low as .001 have interesting variations, and ranges above .25 seem to be… uninteresting. At least in my test cases. |
@rabidcopy i am able to use it on 4GB of VRAM by editing |
Gloom, Watercolor, et al work fine. Then on laion_7 or sac_8 |
File "/home/hulululu/desktop/stable-diffusion-webui-master/extensions/aesthetic-gradients/aesthetic_clip.py", line 233, in call how can i deal with this problem? |
anyone have a link or a quick explanation of what the 'aesthetic text for imgs', 'slerp angle', and 'slerp interpolation' are supposed to do? what should I be typing there? what is the desired effect? (I tried searching the paper and a few articles and readme's for the relevant terms, but I failed to find anything) on low settings for angle it seems to be super random, just changing the entire subject of the image to something that has nothing to do with either the regular prompt or the aesthetic text, and at high settings it just seems to use the aesthetic text as a new prompt (without incorporating the styling of the embedding at all) |
I might be a fool, but which indentations did you use? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hydd
here the original repo: https://github.com/vicgalle/stable-diffusion-aesthetic-gradients