Add Mean Average Precision (mAP) metric #53

ternaus · 2020-07-08T16:06:25Z

The main metric for object detection tasks is the Mean Average Precision, implemented in PyTorch, and computed on GPU.

It would be nice to add it to the collection of the metrics.

The example implementation using numpy:

https://github.com/ternaus/iglovikov_helper_functions/blob/master/iglovikov_helper_functions/metrics/map.py

The text was updated successfully, but these errors were encountered:

OrthoDex · 2020-07-09T06:08:55Z

Hello! First time contributor here. I'd like to take a shot at this!

awaelchli · 2020-07-09T07:18:11Z

would be great to have that metric!

NumesSanguis · 2020-08-11T08:37:46Z

Not yet sure yet, but it looks like AveragePrecision (https://pytorch-lightning.readthedocs.io/en/stable/metrics.html) seems to be acting like mAP. but they didn't add "Mean" to the name?

Kshitij09 · 2020-08-15T22:51:41Z

I recently managed to implement a minimal version of Kaggle's Mean Average Precision metric. The difference is in the calculation itself. You can find the details here:

https://www.kaggle.com/c/global-wheat-detection/overview/evaluation

My kernel: pytorch-mean-absolute-precision-calculation

With slight modifications in the formula of mAP, I suppose we would be able to integrate this metric in pytorch-lightning (since we already have the average-precision implemented)

I've already written a logic to map predicted boxes to ground truth ones (taking their respective scores into consideration) so have a look at the kernel and let me know if you found any issues.

briankosw · 2020-10-28T18:02:53Z

Hi, I would love to take this on. @SkafteNicki could you assign it to me?

SkafteNicki · 2020-10-28T18:15:14Z

@briankosw thanks you for wanting to contribute!
Please ping in your PR.

Vichoko · 2020-10-29T20:38:14Z

We need this! 🤩

For PyTorch lightning modules focused on multi-class classification, would be useful to obtain mAP metric over the test_dataloader data samples in a standard way that works with DDP with minor changes to the architecture. Because this metric is commonly used to conclude about the representation learning side of the deep learning architectures without using any thresholds or max functions to take hard decisions on the predictions.

Most classifiers have a pair of Fully-connected layers that do a classwise likelihood prediction. The output of the last layer before the FC layers could be used for a retrieval task over the test dataset and mAP could be calculated over this.

jspaezp · 2021-03-18T15:43:37Z

@Borda I can give this a go.

But I am wondering what input format would be the most consistent with the rest of the package ...
Most of the stuff that it implemented in the package takes directly tensors of shape [N,C, ...] ... and what scope would you like for it ...

So directly ...

Should it support only pre-aligned inputs? EDIT: I don't think so ... but supporting box inputs requires this to be implemented
- (preds = [N,C] , targets = [N,C] )
Should it support bounding boxes ? if so in what format (xxyy, xyxy, xwyh, xyhw ...)? Should each update be for a single "group" (each image for instance) or should it take a grouping argument?
- (preds: float = [N1,C] , targets: long/bool = [N2,C] , pred_boxes = [x,x,y,y], target_boxes = [x,x,y,y], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5); where the values in the grouping are in the range [0, NUM_GROUPS] so only boxes in the same group are compared with each other.
Should it support masks?
- Option 1: (preds: float = [N,C] , targets: long/bool = [N,C] , pred_masks = [N,H,W], target_masks = [N,H,W], grouping = [N], iou_cutoff = 0.5, mask_threshold = 0.5)
- Option 2: (preds: float = [N1,C,H,W] , targets: long/bool = [N2,C,H,W], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5, mask_threshold = 0.5)

On the other hand ... the COCO implementation has for each image a set of positive and negative labels in each image and ignores the rest ... should that be supported?

lmk what you think
Kindest wishes,
Sebastian

edit: leaving here for reference

lucadiliello · 2021-03-22T09:17:00Z

mAP was added to PL some months ago and is now part of the torchmetrics package.

Borda · 2021-03-22T09:40:36Z

added with 60c0eec

SkafteNicki · 2021-03-22T09:52:15Z

reopening as the current implementation in torchmetrics is for information retrieval and here is being asked for an implementation for object detection.

jspaezp · 2021-03-26T15:49:53Z

Just FYI, I have been working in a implementation over here https://github.com/jspaezp/metrics/tree/feature-spatial-average-precision but it is not quite ready for a PR.

Right now it needs ... (this is more a personal checklist before I send the PR)

DDP testing
Documentation (it is there, just has some argument updating and moving it up the classes and be more user friendly)
"Translating" tests to be consistent with the rest of the testing suite (RN it is more to do some quick and dirty TDD and not really for production consistency)
A fair amount of code cleanup/reorganization so the code order makes logical sense.

If you have any suggestion before I send the PR it would be great but not expected.
Kindest wishes,
Sebastian

SkafteNicki · 2021-03-28T13:00:11Z

@jspaezp if you are going to send a PR, please also take a look at this PR Lightning-AI/pytorch-lightning#4564 (it got closed after we moved from lightning to torchmetrics)
It basically was a nearly finished PR with working map implementation, but just need a bit more work on the testing aspects.

jspaezp · 2021-03-28T14:19:20Z

@SkafteNicki thanks for letting me know! I feel like that implementation would be really hard to convert to a class metric due to how "monolitic" it is but I can definitely work on it.

Best,
Sebastian
(btw: it would have been nice to have that PR referenced in this issue before this time)

SkafteNicki · 2021-03-28T14:25:33Z

@jspaezp I am fine with only having a functional interface (atleast to begin with).
Sorry about not referencing this before, some work got forgotten in the transfer.

patricieni · 2021-05-06T09:41:28Z

hello everyone! What is the status on this PR, is someone actively working on this?

From the prior PR in Lightning-AI/pytorch-lightning#4564 it doesn't seem that would work for DDP, which seems to be the main challenge when implementing this as a metric.

Has anyone tried first implementing this metric as a subclass of torchmetrics.Metric? - our team is currently working on this

zhiqwang · 2021-05-06T09:53:35Z

Hi, as discussed in #190 , Perhaps adding a wrapper of pycocotools is also an option?

tkupek · 2021-08-08T19:55:37Z

Sorry, not yet. Will have a look at it in my vacation in two weeks.

gau-nernst · 2021-08-18T04:25:21Z

Hello, I was also looking for a way to evaluate my object detector in PyTorch Lightning. In the end I implemented some simple logic to make predictions in validation_step(), gather them in validation_epoch_end(), write them to a temp file and run pycocotools. This probably won't work with multi-GPU training.

I tried implementing this as a torchmetrics.Metric subclass before but was not successful, I was not sure why. I can try again.

Just a note on technical details of Metric. Can a metric state be a list of lists, where each is a list of Tensors? The documentation seems to imply that a metric state can only be a torch.Tensor or a list of torch.Tensor.

Also, is it safe to move tensor to CPU inside the compute() method?

zhiqwang · 2021-08-18T05:26:49Z

Hi @gau-nernst ,

I've also written a torchmetrics.Metric wrapper of pycocotools here (currently only supports 1 GPU), it can be used with the lightning-assembly, but there is still a lack of numerical unit-tests and multi-GPU supports.

tkupek · 2021-08-18T06:02:25Z

The main problem with your approach (moving to CPU, writing to file) will be performance and the lack of multi GPU training.

gau-nernst · 2021-08-18T07:00:35Z

Hey @zhiqwang,

Awesome work. I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@tkupek Yes these are the obvious problems with my approach. If we want performance, the logic should be implemented in torch and not used pycocotools. Some people have done this I believe. But the problem is whether the torch implementation will be accurate and consistent with pycocotools.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

zhiqwang · 2021-08-18T07:12:16Z

I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@gau-nernst Yep, I agree with you, the design and functionality of this module still requires some deliberation here.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

I guess it is possiable, and seems that pytorch 1.9 provides a better interface to support the arbitrary picklable data synchronizes for multi-GPUs mode with torch.distributed.all_gather_object , and it has been applied in torchvision: pytorch/vision#3857

SkafteNicki · 2021-08-18T07:31:19Z

It is clearly that you all are more experts than me on this specific metric, but let me get this straight:
Do we need from torchmetrics side to support gathering of arbitrarily objects if we want this metric to work or is there some trivial way around it?

tkupek · 2021-08-18T07:55:26Z

@gau-nernst I have an approach where I bypassed the need for a temp file and initialize the CocoEval class with the detections and targets directly.
To the best of my knowledge, this works with the current torchmetrics functionality.
Let me upload my rough draft in the evening, to discuss the approach together. If we agree, we can improve the code to make it acceptable for the torchmetrics package.

tkupek · 2021-08-18T16:39:05Z

So this is my approach:
https://github.com/tkupek/pytorchlightning-metrics/blob/mean-average-precision/torchmetrics/image/map.py

It takes the preds and targets and converts them into COCO format in memory, then initializes the COCOeval class and runs the evaluation.
In my company this works smoothly with only a minor performance impact and even for multi GPU training.

Incomplete TODO list:

check if pycocoeval can handle tensors to avoid .cpu() calls
standardize MAPMetricResults to have all evaluation results in there
refactor some code parts (e.g. join get_coco_target and get_coco_preds methods)
add unittests and documentation in torchmetrics format

@zhiqwang @gau-nernst @SkafteNicki if you agree with this approach I can try to finalize the draft in the next week (or we can work on this together).

SkafteNicki · 2021-08-19T06:41:14Z

@tkupek Looks great to me. Please send a PR to us with what you have, then we can begin to review/comment/add to the your code.

SkafteNicki · 2021-08-20T11:30:15Z

If anyone has any opinion on the input format that preds and target should have in this metric, please comment on #467

tkupek · 2021-08-20T16:26:41Z

I had a look how tensorflow is handling this and it seems like a dict is most common:
https://github.com/tensorflow/models/blob/master/research/object_detection/metrics/coco_evaluation.py

They use this for the groundtruth (target):

groundtruth_dict = {
  "groundtruth_boxes": [ymin, xmin, ymax, xmax],
  "groundtruth_classes": [num_boxes]
}

and this for predictions (preds):

detections_dict = {
  "detection_boxes": [ymin, xmin, ymax, xmax],
  "detection_scores": [num_boxes],
  "detection_classes": [num_boxes]
}

I find this intuitive and would go the same way. One could discuss if we want different dict keys for groundtruth and detections (but I think we should do things more standardized).

The CocoEval class needs a list of values anyways, but I guess it would be more consistent for the torchmetrics API to expect fiels of type Tensor? -> I drafted a version that allows torch.Tensor, np.ndarray and List

Interesting sidenote: They also use a pycocotools wrapper.

twsl · 2021-08-25T12:08:33Z

I'd like to add, that this would be beneficial for object detection and segmentation as well

tkupek · 2021-08-26T13:51:39Z

@twsl we are implementing this for object detection (bounding boxes). You can have a look at the PR.
Do you know if the pycocotools work for segmentation as well? Maybe you even have an example?

twsl · 2021-08-26T14:00:48Z

Yes, i do think so.
Actually something similar based on IoU is already available here

tkupek · 2021-08-26T14:13:52Z

@SkafteNicki do you think this needs to be a part of this PR or can we make this an additional improvement?

SkafteNicki · 2021-08-27T09:26:56Z

@tkupek, no lets try to get the metric merge as it is right now.
@twsl feel free to open a new issue when this is closed with the proposed enchancements :]

senarvi · 2021-09-12T09:23:27Z

PyTorch Lightning Bolts includes two object detection models. They take a list of dictionaries containing:

"boxes": [num_boxes, 4] the ground truth boxes in (x1, y1, x2, y2) format
"labels": [num_boxes] the class label for each ground truth box

The detections contain:

"boxes": [num_boxes, 4] predicted boxes in (x1, y1, x2, y2) format
"scores": [num_boxes] detection confidences
"labels": [num_boxes] the predicted class label for each box

Torchvision ops, for example NMS, take "boxes" and "scores" consistently in the same format:

It would be convenient to standardize the keys and use the same keys as what the models use. (We can also change the models to use "classes" instead of "labels" if that's better.) Most importantly, I would prefer to use the same format for the boxes (x1, y1, x2, y2). @tkupek why do you think (y1, x1, y2, x2) is intuitive? I haven't seen this format being used before? Also note that torchvision.ops.box_convert() supports formats "xyxy" and "xywh", but not "yxyx"

tkupek · 2021-09-12T10:10:31Z

@senarvi Thank you for your comment.
I think this is reasonable and fixed the dict keys to your suggestion (boxes, scores, labels).

The (y1, x1, y2, x2) format was an error in the docstring, we actually do use (x1, y1, x2, y2). Fixed that, too.

@SkafteNicki @Borda where are we on the issues with DDP and adjusting the test. Any time to look at this yet? I'm happy to help but I need some hints.

twsl · 2021-09-23T09:39:54Z

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

senarvi · 2021-09-23T13:24:54Z

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

Using the COCOeval based metric is slow. I haven't checked if it's the the GPU-CPU transfer or if the operation is just so slow. If you can make it faster, that would be great, but I have a feeling that it's going take some effort to utilize the GPU efficiently and still keep the implementation correct.

tkupek · 2021-09-23T14:11:21Z

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

I agree with @senarvi, it would be great but is probably not trivial. Feel free to give it a try based on the existing PR and let us know what you learned 🙂

tkupek · 2021-10-15T08:43:56Z

The PR looks quite complete to me. Can we finalize it?
#467

SkafteNicki assigned briankosw Oct 28, 2020

Borda transferred this issue from Lightning-AI/pytorch-lightning Mar 12, 2021

Borda added enhancement New feature or request help wanted Extra attention is needed labels Mar 17, 2021

Borda closed this as completed Mar 22, 2021

SkafteNicki reopened this Mar 22, 2021

SkafteNicki added the New metric label May 6, 2021

stale bot added the wontfix label Jul 5, 2021

SkafteNicki removed the wontfix label Jul 7, 2021

Borda changed the title ~~Feature request: Add Mean Average Precision (mAP) metric~~ Add Mean Average Precision (mAP) metric Jul 7, 2021

Lightning-AI deleted a comment from github-actions bot Jul 7, 2021

Lightning-AI deleted a comment from stale bot Jul 7, 2021

Lightning-AI deleted a comment from stale bot Jul 16, 2021

Borda unpinned this issue Aug 8, 2021

tkupek mentioned this issue Aug 19, 2021

Add mean average precision metric for object detection #467

Merged

4 tasks

Borda removed the help wanted Extra attention is needed label Sep 20, 2021

twsl mentioned this issue Oct 18, 2021

Accuracy for Object detection #568

Closed

Borda closed this as completed in #467 Oct 27, 2021

Borda added the topic: Image label Aug 25, 2023

karwojan mentioned this issue May 23, 2024

DataLoader worker is killed in Docker #2559

Closed

Add Mean Average Precision (mAP) metric #53

Add Mean Average Precision (mAP) metric #53

Comments

ternaus commented Jul 8, 2020

OrthoDex commented Jul 9, 2020

awaelchli commented Jul 9, 2020

NumesSanguis commented Aug 11, 2020

Kshitij09 commented Aug 15, 2020

briankosw commented Oct 28, 2020

SkafteNicki commented Oct 28, 2020 • edited Loading

Vichoko commented Oct 29, 2020 • edited Loading

jspaezp commented Mar 18, 2021 • edited Loading

lucadiliello commented Mar 22, 2021

Borda commented Mar 22, 2021

SkafteNicki commented Mar 22, 2021

jspaezp commented Mar 26, 2021

SkafteNicki commented Mar 28, 2021

jspaezp commented Mar 28, 2021

SkafteNicki commented Mar 28, 2021

patricieni commented May 6, 2021

zhiqwang commented May 6, 2021 • edited Loading

tkupek commented Aug 8, 2021

gau-nernst commented Aug 18, 2021

zhiqwang commented Aug 18, 2021 • edited Loading

tkupek commented Aug 18, 2021

gau-nernst commented Aug 18, 2021

zhiqwang commented Aug 18, 2021 • edited Loading

SkafteNicki commented Aug 18, 2021

tkupek commented Aug 18, 2021

tkupek commented Aug 18, 2021

SkafteNicki commented Aug 19, 2021

SkafteNicki commented Aug 20, 2021

tkupek commented Aug 20, 2021 • edited Loading

twsl commented Aug 25, 2021

tkupek commented Aug 26, 2021

twsl commented Aug 26, 2021 • edited Loading

tkupek commented Aug 26, 2021

SkafteNicki commented Aug 27, 2021

senarvi commented Sep 12, 2021

tkupek commented Sep 12, 2021

twsl commented Sep 23, 2021 • edited Loading

senarvi commented Sep 23, 2021

tkupek commented Sep 23, 2021

tkupek commented Oct 15, 2021

SkafteNicki commented Oct 28, 2020 •

edited

Loading

Vichoko commented Oct 29, 2020 •

edited

Loading

jspaezp commented Mar 18, 2021 •

edited

Loading

zhiqwang commented May 6, 2021 •

edited

Loading

zhiqwang commented Aug 18, 2021 •

edited

Loading

zhiqwang commented Aug 18, 2021 •

edited

Loading

tkupek commented Aug 20, 2021 •

edited

Loading

twsl commented Aug 26, 2021 •

edited

Loading

twsl commented Sep 23, 2021 •

edited

Loading