-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About Annotator Resolution #924
Comments
what about we add a toggle say "pixel-perfect" next to the "guess mode" so that if it is selected, we automatically compute the annotator resolution |
The annotator's resolution need to match the height of the resulting image, otherwise, it may result in displacement. I've seen this when using multi-cnet w/ different annotator resolutions. Also, can we include an option to "copy img to all cnet" feature? It can become quite tiresome when working with multi-cnet, especially in text-to-text scenarios where you have to duplicate the reference image across all cnet tabs. |
it is not the height, it is the short side of resizing after first pass in control map resizing before finding the x64 nearest neighbor. This is just super difficult to understand and people including me cannot set correct values. ControlNet also needs a pixel-perfect mode. https://github.com/Mikubill/sd-webui-controlnet/tree/pixel-perfect see also commit #926 |
Good news! Thank you |
@lllyasviel |
supported |
Is this undocumented? You can leave the cnet input image empty and it will fallback to the img2img init image. |
Ha! Never noticed it! |
Well, I tested it with CNet0: Lineart anime; CNet1: ZOE; CNet2: Softedge pidinet with only CNet0 having one input image and it seems not working. If I disable CNet1 and CNet2 then it starts working again. |
Sorry if I was not clear, this only works with img2img when you have an init image. |
Okay... I think for text2img we need a similar function, e.g. use the input for CNet-0 as default if no other image was inputted. |
How to set the Annotator Resolution is always a difficult thing, and users are likely to get frustrating results if their Annotator Resolution is not very correct.
For example, to diffuse 1024 ×1024 images, the Annotator Resolution should be 1024 rather than the default 512 (excepting using depth as Annotator).
However again, because multiple resizing methods are available, the Annotator Resolution also depends on Crop and Resize v.s. Resize and Fill, and the correct Annotator resolution becomes really difficult to think about:
For example, if the A1111 resolution is 640 × 512, the input control image is 512 × 768, and we use “Crop and Resize” then the control image will be first resized by ControlNet to (512×640/512) × (768×640/512) = 640 × 960, and then it will be crpped to 640 × 512.
In this case, if we want the annotator (say canny) to be pixel-perfect, we need to use the short side of 640 × 960, say 640 (not the short side of 512×640 which is 512, !!), and then compute in our human mind that this number should have a closet neighbor to 64 factor, say 64×round(640/64) = 64×10=640. Lucky, it is still 640.
In this way, the final correct Annotator Resolution is 640. What the heck. Who is able do such computation in their mind? I am also confused from time to time.
I think we should have a solution to this, but I think it is a bad idea to force a correct value because we also want to allow users to control the resolution as they want.
Perhaps a better idea is to add some hints but I am not sure where to add such hints. And sometime users may be bored by too many crowded UI. (but if we can implement it in gradio I can have a try)
Anyone has ideas?
The text was updated successfully, but these errors were encountered: