Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive plots with bounding boxes #4917

Open
BradyJ27 opened this issue Oct 26, 2023 · 22 comments
Open

Interactive plots with bounding boxes #4917

BradyJ27 opened this issue Oct 26, 2023 · 22 comments
Assignees
Labels
A: plots Area: plots webview, side panel and everything related priority-p1 Regular product backlog story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc

Comments

@BradyJ27
Copy link
Contributor

BradyJ27 commented Oct 26, 2023

In computer vision, specifically object detection, it is common for a pipeline to output images with bounding boxes displaying the area of interest for specific objects. When the image(s) are relatively small or packed with multiple objects, it can be hard to view these images.

It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.

This may require some dvc or dvc-render changes first, but just opening here because it would be beneficial to have this implemented within the VS-Code extension.

Related issues:

iterative/dvc#10198

@mattseddon
Copy link
Member

mattseddon commented Oct 26, 2023

To clarify:

Are there multiple images created by the pipeline or do you have an original image that you want to compare to the output? Can you give an example of what is produced by the model?

@BradyJ27
Copy link
Contributor Author

BradyJ27 commented Oct 26, 2023

Usually it is multiple images. For example, we would print out the labels for the detections that we have made on the validation set.

The following example is from YoloV8s default output. The dvclive yolo demo notebook is a good place to reproduce this.

77bbb4d8-97c4-4166-b55b-64ecaad1a4a1.jpeg

Yolo actually does both the validation labels (the ground truths) and the predicted values. This could be useful for comparing and contrasting.

@mattseddon
Copy link
Member

Let me clarify the question. In the example above are there multiple images available for 000000000042.jpg? Do you have an image available with each of the combinations of labels available? I.e 1 for each of

  • baseline
  • dog
  • motorcycle
  • dog + motorcycle

I am not an expert in image manipulation but AFAIK removing these labelling boxes from an image is not a trivial task.

Have you seen this done elsewhere?

@BradyJ27
Copy link
Contributor Author

BradyJ27 commented Oct 27, 2023

So the image is just a copy of the training or validation image with bounding boxes added using some library (usually matplotlib). The bounding boxes are stored in a formatted file (xml, csv, json, or some custom format) normally like "label,x1,y1,x2,y2" (one for each label, i.e. "person,..." \n "dog,...")

So the approach would not be to manipulate the image with the boxes already on it, but rather set the image as the original image from the validation set, then place the interactive bounding boxes over the original image.

In other words, we have 2 files:

  • image.png
  • image_labels.xml

And the above images are generated by combining the two files reading the labels and placing them on top of a copy of the original image, thus creating the third file which is the image with bounding boxes displayed. My suggestion is that we take this step and turn it into some interactive format within dvc.

@dberenbaum
Copy link
Contributor

See https://docs.wandb.ai/guides/track/log/media#image-overlays for ideas on how others do this

@mattseddon mattseddon added the story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc label Oct 30, 2023
@BradyJ27
Copy link
Contributor Author

One option for implementation would be a custom plot template, right?

Or is this something that's a little more in depth and actually a bigger feature?

@mattseddon
Copy link
Member

One option for implementation would be a custom plot template, right?

No, I do not believe that you could shoe-horn the required data/image into the current DVC plots engine.

Or is this something that's a little more in depth and actually a bigger feature?

My opinion is that this is a larger feature given the current state of plots.

@BradyJ27
Copy link
Contributor Author

My opinion is that this is a larger feature given the current state of plots.

Ok, that makes sense. I'm sure some more discussion needs to be had regarding implementing something like this, but I would be happy to help contribute!

@mattseddon
Copy link
Member

@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?

@mattseddon
Copy link
Member

Looks like we might be able to get away without using a plotting library for this. One potential way would be to use https://github.com/lovell/sharp in the clients + generate SVG bounding boxes based on the definitions (XML or other files). Loading the original image with the previous package gives us the option to call image.overlayWith(svgElementBuffer, {top:0, left:0}).toBuffer() where the svgElementBuffer is an SVG full of <rect> elements (source).

@BradyJ27
Copy link
Contributor Author

BradyJ27 commented Nov 3, 2023

@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?

I can share an example of the default yolo labels. This is just a text file, but the idea is the same in txt, csv, xml, json, etc. It can technically be any type of file, depending on what architecture you are using, but the above are the most common.

000000000009.txt

@mattseddon
Copy link
Member

How do you determine which class the provided data relates to?

This is the contents of the file (for anyone else reading the issue):

45 0.479492 0.688771 0.955609 0.5955
45 0.736516 0.247188 0.498875 0.476417
50 0.637063 0.732938 0.494125 0.510583
45 0.339438 0.418896 0.678875 0.7815
49 0.646836 0.132552 0.118047 0.0969375
49 0.773148 0.129802 0.0907344 0.0972292
49 0.668297 0.226906 0.131281 0.146896
49 0.642859 0.0792187 0.148063 0.148062

@BradyJ27
Copy link
Contributor Author

BradyJ27 commented Nov 3, 2023

@mattseddon the first number corresponds to a dictionary containing the classes.

It's something like:

...
44: "dog",
45: "person",
46: "car",
...

This is found in a dataset configuration file (specifically for yolo), which is data.yaml.

There is often some configuration similar to this whether it be a dictionary in a training script, a data configuration file, or sometimes the labels are hard coded in the labels file.

I will say that this above is yolo specific, it is more often just the actual label instead of a number corresponding to a dictionary.

@shcheklein shcheklein added priority-p1 Regular product backlog and removed triage labels Dec 12, 2023
@BradyJ27
Copy link
Contributor Author

I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?

@julieg18
Copy link
Contributor

I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?

@BradyJ27, feel free to do that, thanks. I've started to look into how Studio and VSCode are going to render these images but I'm currently not looking into dvc/dvc-render side of things.

@julieg18
Copy link
Contributor

julieg18 commented Jan 9, 2024

While researching on UX, I took into account that while both Studio and VSCode use React for the frontend, Studio has a backend based in Python and VSCode has a backend based in NodeJS. So far, I've come up with two ideas on how the clients (VSCode/Studio) would handle this.

Ideas

  1. Rely on the client backend to create images with the needed bounding boxes. The frontend would render these images. (See Matt's comment)
  2. Send the box coordinates to the frontend and have the frontend render the bounding boxes onto an image using SVGs or HTML canvas (I believe W&B uses Canvas to create the bounding boxes)

Details

  1. Rely on the client backend to create images with the needed bounding boxes. The frontend would render these images.

Pros

Both NodeJS and Python have multiple image manipulation libraries that we could use for creating images with bounding boxes. Matt has already mentioned sharp for NodeJS.

Cons

Studio and VSCode have different backends, so we would have to go about creating images in different ways. This would make keeping things consistent across products more difficult.

  1. Send the box coordinates to the frontend and have the frontend render the bounding boxes onto an image using SVGs or HTML canvas (I believe W&B )

Pros

Since both Studio and VSCode use React in the frontend, it will easier to have consistent plots in both clients. React also has some libraries for Canvas (KonvaJS, FabricJS) and SVGs that would simplify the solution instead of using just Vanilla APIs.

Cons

The solution for rendering the bounding boxes will probably be a bit more complicated then using the methods that backend libraries offer.

What do we think?

@dberenbaum
Copy link
Contributor

It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.

We will probably want some level of interactivity like this at some point, so I think it makes sense to go with option 2.

@shcheklein shcheklein changed the title plots: Interactive plots with bounding boxes Interactive plots with bounding boxes Jan 16, 2024
@shcheklein shcheklein added the A: plots Area: plots webview, side panel and everything related label Jan 16, 2024
@julieg18
Copy link
Contributor

Started working on implementing this and, after trying HTML Canvas and SVGs, decided on using SVGs to render the plots since they are easier to create and will be more performative especially when it comes to resizing the plots.

Design

Next, I started working on the UI design for the togglable boxes. Here is what I have so far (created in storybook):

Screenshot 2024-01-17 at 10 20 10 AM

Screenshot 2024-01-17 at 9 58 01 AM

Looking at Studio, either version could fit there as well:

image

Questions About Implementation

  • Do we want to toggle classes in all revision plots for a specific image path at once or have the toggles per single plot? I tried designs for both for now. There's also the option of toggling classes across all images in the webview at once.
  • What colors are we going to be using for the bounding boxes? I just chose red and blue for now but I'm assuming we want a pre-set of more muted colors?

What do we think? cc @shcheklein @iterative/vs-code

@shcheklein
Copy link
Member

Look cool, @julieg18 !

Do we want to toggle classes in all revision plots for a specific image path at once or have the toggles per single plot? I tried designs for both for now. There's also the option of toggling classes across all images in the webview at once.

My 2cs. I think we should do toggle all images per path at once, for now.

What colors are we going to be using for the bounding boxes? I just chose red and blue for now but I'm assuming we want a pre-set of more muted colors?

let's take a look how YOLO generates colors / boxes and take if from it?

@mattseddon
Copy link
Member

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this?

@dberenbaum
Copy link
Contributor

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this?

I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation (similar to images per step).

@julieg18
Copy link
Contributor

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this?
I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation

Are we referring to the DVC CLI being able to create these plots with bounding boxes?

If so, if it is doable for the CLI to create the bounding box plot SVGs, that could help with consolidation since Studio and VS Code would only need to create logic for toggling boxes. Currently, both Studio and VSCode need to create the SVG elements from the image src and bb coordinates as well as the toggle logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: plots Area: plots webview, side panel and everything related priority-p1 Regular product backlog story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc
Projects
None yet
Development

No branches or pull requests

5 participants