-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image
component requests
#466
Comments
this issue looks very old, what's its status? |
Still important just not as important as other things. |
Thanks for your hard work on an awesome tool! I just wanted to chime in on why this is important to me, as a user. The image editor was one of my favorite Gradio 2.x features. It allowed me to "play" with my computer vision models in the same way that NLP folks have been able to play with theirs. I used it very fruitfully to probe and understand the failure modes of an OCR model. This makes it a killer feature to combine with flagging as part of an "exploratory model analysis" workflow, where Gradio can shine as a central component. Without the full editor, I have much less reason to prefer Gradio for this over other libraries for rapid model-centric app development, like Streamlit. I'd also like to register that it was very confusing to see the documentation for the Cheers, and thanks for making a really useful library! |
Thanks @charlesfrye for the very useful feedback! We are definitely planning on bringing it back, but most likely using our own implementation so that we have more control over it. Would you be able to tell us which parts of the editor were most useful for you? Blurring / cropping / coloring / etc.? |
I also found the image editing function very useful to modify input images to check the robustness of models, and I'm glad to hear there's a plan to bring it back. In my case, rotating, flipping, blurring, cropping, changing aspect ratios, and adding noise were useful for checking the performance of object detection models, image classification models, etc. Drawing tools were also useful for partially or completely occluding objects in images. Some of the features that were missing in the previous image editor and that I wanted are discussed in the following issues. |
@abidlabs Happy to help! The most useful transformations were adding noise, blurring, adding text, and erasing/drawing. Adding noise and blurring are nice generic robustness tests, but they are relatively easy to do in a library like torchvision. Erasing, drawing, and adding text, on the other hand, are much harder to automate and so aren't as readily available in existing modeling libraries. For "gradio as a tool for exploring models", I think it is generally the case that those more interactive editing tools would be highest value-add. Rotations and flips were less useful, but that may be specific to the use case I spent the most time with -- the OCR model expected text to be mostly oriented correctly. |
I have updated the parent issue to try to capture the various requests we have had and start a conversation about how we design this. Please take a look and provide any feedback, it would be very much appreciated! |
This looks great @pngwn and definitely captures the vast majority of user feedback that I've heard. A few thoughts:
This is something we've heard a lot. In the Python API, if users pass in a list for the
LGTM. One additional request we've heard is the ability to type text onto an image. This is useful for OCR-type models. See @charlesfrye's comments in the thread above, for example.
It seems that some users strongly prefer dealing with a single image, while others require separate layers for the image, mask, and sketch, I think we should actually provide this as an option that can be controlled via the Python API. The I didn't follow what you meant about the "example of three masks" |
@abidlabs regarding the three masks, it is an example pulled from this comment:
I'll add text to the feature list. |
I came here to add a request for a mask eraser. It's sometimes a pain to have to reverse and delete the whole mask when you only want to be able to erase a small part of it. |
Hi, I would love to see the possibility to have an example for a masked image input in a space. |
I recently started using gradio, and it's been really helpful. My major challenge is reducing the brush size. Fortunately the issue was mentioned earlier.
|
@pngwn I'm strongly in favor of improving this component as, whether intended or not, it's become the de-facto interface for Stable Diffusion and folks are currently zooming their browser windows to see what they're masking. BTW, I'd also recommend looking at InvokeAI's implementation of its unified canvas feature. Even just adding shortcuts and a functioning zoom to the existing Image component would go a long way toward filling the gap short term. Since I come from the Desktop UI world, can you or someone explain if the Image component currently supports "focus" and "keyboard" listeners? And if not, which package one might use to support those features that would be acceptable on the Gradio-side? I'd be willing to tinker with this on my own. Many thanks. |
I don't know if here is the right place to ask then any tool adds a cherry on the pi 😎 |
@anapnoe there are reasons but they aren't particularly good ones. This will be addressed in the rewrite. The performance of the current sketch tool is quite poor currently, especially with large images. |
How about a |
On windows using chrome I can drag&drop an image from another tab/window into gradio. But on mac this doesn't work. |
Is your feature request related to a problem? Please describe.
We are getting a lot of feature requests for the different interactive
Image
variants, historically we had numerous different tools to handle the different kinds of Image editing functionality, this has improved somewhat but the Image is less feature rich than it used to be. Sadly over time the new (kinda)Image
component has become difficult to maintain and extend and need substantial refactoring to realise its full potential.We are also finding the current signature of pre and post-process limiting, the challenge here is that the sensible thing would be to issue a breaking change to make a clean break with the past but we don't want to introduce too much churn for users.
This issue will collate all feedback we have received so far (assuming I can find it all) and act as a single place to discuss features and design of a new unified image component.
overview
Broadly speaking the
Image
component (as an interactive input) has two key parts: the source of the image and the editing capabilities. The rewrite that will stem from this issue will preserve the different inputs (and maybe there are more people would like to see) but unify the editing tools into a single, simple (hopefully) GUI. The high level thinking is that the Gradio developer will be able to toggle and constrain the various features one by one if necessary (with defaults and templates making this even simpler for users who do not need such granular control).References to "Gradio API" refer to controlling the feature via the Gradio Python API when creating the app.
References to "GUI" refer to end users controlling the feature in the browser when interacting with the tool.
inputs/source
One thing that has crossed my mind is allowing multiple inputs, this would allow Gradio app authors to be very flexible with what kinds of
Image
sources are set which will work well for some more general models. Could be controllable via the GUI (defaulting to blank background but with buttons/ toggles to enable different source modes).Note:
source=canvas
would be deprecated as everything can be a canvas in the new world.Are there other possible inputs we should consider here?
editor tools
Currently the Gradio Image component is a simple raster/bitmap graphics editor but there is no reason we cannot support certain vector features. I would be wary of attempting any kind of comprehensive vector tools (specifically things like modifying paths + curves, creating new shapes from the intersection/union of multiple shapes, etc.) but we could support some simple shape tools with transforms (translate/rotate/resize). It would probably make sense to start with rasters only because combining vectors and raster images introduces some complexities that we have no choice but to push onto the user (such as needing to rasterise vectors and flatten layers in order for image filters to work as expected).
We have had lots of requests so here goes:
general
Some more general things can need to be handled better. The main thing I can think of here is the size of the canvas. It is a little better today than it was yesterday but still not ideal.
fullscreen mode
We have never really had this, the previous 'full screen' wasn't really full screen but we should add this.
canvas size
I'm not 100% sure what is the best way to approach the canvas size. I definitely thing we need to respect any options passed into the Gradio API, so app authors can set the most appropriate canvas size (and ratio) for their model but I'm not sure about other cases.
Currently we size the cavas based on either the 'source' or if there isn't one, the screensize. So if a users uploads a 500x500 image, then that will be the size of the canvas (scaled in the browser to account for device pixel ratio) but this might not be ideal as very large images could slow down the predictions. We could accept a max width/ height and never go above that to ensure we aren't sending huge images back to the server to be processed unwittingly.
Would love to get people's thoughts on this one.
performance
Performance iif the current component is ok but there are some performance issues which are a result of a number of things. They can be addressed in a rewrite as we will almost certainly need to switch to webgl to do implement some of these features in a performant manner (while maintain good UX). Calling them out here for posterity, not a great deal to discuss.
pre and post-process
These signatures need to change, they aren't work right now and things aren't going to get any better. This is a pretty significant breaking change because the
Image
component is our most used component. We will have to discuss how we manage this.I think the image component should switch to always returns a dictionary with a series of keys. We have had numerous requests about returning certain layers separately and others together, so we can discuss the specifics in this thread but something like:
There are questions around the exact shape of this, what if we have multiple masks? Should that be a list/array on the
mask
key or should every layer have its own key? Should the return be a list of dicts instead, containing meta information about that layer? How does an app author figure out what each layer is for (take the example of three masks again)? Should we also return a composite image in addition to the separate layers?Would be good to get people's thoughts on this one as well.
Issues
Features
gr.Image
when the width and height and are set #1888sketch
component #2314gr.Gallery
orgr.Image
#2236Feature requests but should be custom components?
Bugs
gr.Image
withtool="color-sketch"
and a pre-loaded image #3248The text was updated successfully, but these errors were encountered: