Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add COCO-styled mAR to evaluate_detections #5274

Merged
merged 7 commits into from
Dec 20, 2024
Merged

Conversation

manushreegangwar
Copy link
Contributor

@manushreegangwar manushreegangwar commented Dec 13, 2024

What changes are proposed in this pull request?

This PR adds COCO-styled mean average recall to evaluate_detections().

How is this patch tested? If it is not, please explain why.

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

results = dataset.evaluate_detections(
    "predictions",
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
)

print(results.mAR())

# Open Model Evaluation panel and see mAR in the table :)
session = fo.launch_app(dataset)

Summary by CodeRabbit

  • New Features

    • Introduced Mean Average Recall (mAR) metric in evaluation summaries.
    • Added functionality to visualize and interact with evaluation results in the app.
    • Enhanced COCO evaluation capabilities to compute mAR alongside mAP.
  • Documentation

    • Expanded documentation on evaluation methods, including mAR and its integration with COCO-style evaluations.
    • Updated examples and explanations for clarity and usability.

Copy link
Contributor

coderabbitai bot commented Dec 13, 2024

Walkthrough

This pull request introduces mean Average Recall (mAR) as a new evaluation metric across multiple components of the FiftyOne framework. The changes span documentation, utility functions, and UI components, enabling comprehensive support for mAR in detection model evaluations. The implementation focuses on integrating mAR calculation methods, updating evaluation panels, and providing clear documentation for users to understand and utilize this new performance metric.

Changes

File Change Summary
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx Added mAR metric to summaryRows with conditional visibility for object detection. Enhanced formatPerClassPerformance for sorting and limiting metrics.
docs/source/integrations/coco.rst Updated documentation to include mAR explanation and example code snippets.
docs/source/user_guide/evaluation.rst Enhanced evaluation documentation with mAR details, method signatures, and improved clarity on evaluation methods.
fiftyone/operators/builtins/panels/model_evaluation/__init__.py Added get_mar() method and updated load_evaluation() to include mAR metric.
fiftyone/utils/eval/coco.py Introduced mAR computation methods and updated evaluation result generation.

Possibly related issues

Possibly related PRs

Suggested labels

enhancement, documentation

Suggested reviewers

  • imanjra
  • findtopher

Poem

🐰 Metrics dance, mAR takes flight,
Precision's friend, a brilliant light!
Recall's embrace, now clearly shown,
In FiftyOne's evaluation zone.
Hop, hop, hooray for data's might! 📊

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

fiftyone/utils/eval/coco.py Outdated Show resolved Hide resolved
fiftyone/utils/eval/coco.py Outdated Show resolved Hide resolved
fiftyone/utils/eval/coco.py Outdated Show resolved Hide resolved
fiftyone/utils/eval/coco.py Outdated Show resolved Hide resolved
Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manushreegangwar can you please also document mAR in the relevant places so that users know we support COCO mAR now?

        compute_mAP (False): whether to perform the necessary computations so
            that mAP, mAR, and PR curves can be generated

fiftyone/utils/eval/coco.py Outdated Show resolved Hide resolved
@manushreegangwar manushreegangwar changed the title [WIP] Add COCO-styled mAR to evaluate_detections Add COCO-styled mAR to evaluate_detections Dec 18, 2024
@manushreegangwar manushreegangwar marked this pull request as ready for review December 18, 2024 17:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2734ab and 2992a7d.

📒 Files selected for processing (5)
  • app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1 hunks)
  • docs/source/integrations/coco.rst (3 hunks)
  • docs/source/user_guide/evaluation.rst (3 hunks)
  • fiftyone/operators/builtins/panels/model_evaluation/__init__.py (2 hunks)
  • fiftyone/utils/eval/coco.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

🪛 Ruff (0.8.2)
fiftyone/operators/builtins/panels/model_evaluation/__init__.py

140-140: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

fiftyone/utils/eval/coco.py

799-799: Do not use bare except

(E722)

🔇 Additional comments (14)
fiftyone/utils/eval/coco.py (7)

54-54: Docstring enhancement is clear and concise
The updated description clarifying that mAR is also computed alongside mAP and PR curves is well-written and informative.


222-222: Addition of this parameter reference looks good
Adding the recall_sweep argument in this return signature is aligned with the new mAR functionality.


236-236: Consistent initialization of the new recall_sweep parameter
Passing recall_sweep=recall_sweep ensures proper encapsulation of the new metric within the constructor.


259-260: Helpful docstring expansion
The shape and usage of recall_sweep is now clearly stated.


278-278: Constructor param for recall_sweep is well integrated
No issues found in this additional parameter.


301-303: Great separation of classwise AP and AR
The logic for storing _classwise_AR when recall_sweep is available is straightforward and clear.


388-416: mAR method is well implemented
The method follows the same structure as mAP, correctly handling missing or negative classwise AR.

docs/source/integrations/coco.rst (4)

474-476: Clear explanation for new metrics
Noting mAR alongside mAP is a welcome addition for users wanting comprehensive COCO-style evaluations.


481-481: Good to highlight COCO evaluation protocol
Clarifies that the mAR computations follow the same reference implementation.


493-493: Helpful mention of the IoU sweep
Explicitly mentioning that IoU thresholds are swept for both mAP and mAR is very informative.


504-506: Clear usage example for results.mAR()
Shows the alignment with the newly introduced metric in code.

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)

376-382: mAR column integrated appropriately
The new entry for mAR in the summaryRows seamlessly aligns with existing detection metrics, improving consistency of displayed evaluation results.

docs/source/user_guide/evaluation.rst (2)

1044-1054: Documentation structure and accuracy look good

The section header and note about COCO evaluation protocol are clear and accurate. The documentation properly explains that mAR calculations follow the COCO evaluation protocol.


Line range hint 1066-1078: Code example effectively demonstrates mAR functionality

The code example clearly shows:

  1. How to enable mAR computation using compute_mAP=True
  2. How to retrieve the mAR value using results.mAR()
  3. The expected format of the output

The example is concise and follows documentation best practices.


except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid bare except
Using a bare except makes debugging harder and can mask unexpected errors.

Below is a suggested fix narrowing the exception scope:

-            except:
-                pass
+            except IndexError:
+                pass

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

799-799: Do not use bare except

(E722)

Comment on lines +137 to +148
def get_mar(self, results):
try:
return results.mAR()
except Exception as e:
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused exception variable
Since “e” is never used, it is preferable to remove the variable name or handle it properly.

 def get_mar(self, results):
     try:
-        return results.mAR()
-    except Exception as e:
+        return results.mAR()
+    except Exception:
         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_mar(self, results):
try:
return results.mAR()
except Exception as e:
return None
def get_mar(self, results):
try:
return results.mAR()
except Exception:
return None
🧰 Tools
🪛 Ruff (0.8.2)

140-140: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM now, but let's wait until fiftyone==1.2.0 is merged back into develop and then rebase this PR so that we can confirm that mAR behaves as expected in the Model Evaluation panel in the App:

  • It is displayed when compute_mAP=True was passed
  • Otherwise, it does not appear

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manushreegangwar update: I went ahead and rebased on latest develop as it already includes the latest from release/v1.2.0. When I run the example code in the PR description, I'm not seeing mAR populate in the table:

Screenshot 2024-12-20 at 10 39 48 AM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
fiftyone/utils/eval/coco.py (3)

54-54: Enhance docstring clarity for compute_mAP parameter

The parameter name compute_mAP could be misleading since it now controls both mAP and mAR computation. Consider renaming it to compute_metrics or adding a note explicitly mentioning both metrics.


388-416: Consider refactoring to reduce code duplication

The mAR implementation shares significant logic with mAP. Consider extracting the common functionality into a helper method to improve maintainability.

+    def _compute_metric_average(self, metric_values, classes=None):
+        if classes is not None:
+            class_inds = np.array([self._get_class_index(c) for c in classes])
+            values = metric_values[class_inds]
+        else:
+            values = metric_values
+
+        values = values[values > -1]
+        if values.size == 0:
+            return -1
+
+        return np.mean(values)
+
     def mAP(self, classes=None):
-        if classes is not None:
-            class_inds = np.array([self._get_class_index(c) for c in classes])
-            classwise_AP = self._classwise_AP[class_inds]
-        else:
-            classwise_AP = self._classwise_AP
-
-        classwise_AP = classwise_AP[classwise_AP > -1]
-        if classwise_AP.size == 0:
-            return -1
-
-        return np.mean(classwise_AP)
+        return self._compute_metric_average(self._classwise_AP, classes)

     def mAR(self, classes=None):
         if self._classwise_AR is None:
             raise Exception(
                 "Classwise AR is not available. mAR can't be computed."
             )
-        if classes is not None:
-            class_inds = np.array([self._get_class_index(c) for c in classes])
-            classwise_AR = self._classwise_AR[class_inds]
-        else:
-            classwise_AR = self._classwise_AR
-
-        classwise_AR = classwise_AR[classwise_AR > -1]
-        if classwise_AR.size == 0:
-            return -1
-
-        return np.mean(classwise_AR)
+        return self._compute_metric_average(self._classwise_AR, classes)

Line range hint 1-807: Overall implementation assessment

The implementation successfully adds mAR support while maintaining backward compatibility. The code is well-structured and documented. Consider the following architectural improvements:

  1. Consider splitting the evaluation metrics (mAP, mAR) into separate strategy classes to improve modularity and make it easier to add new metrics in the future.
  2. Add integration tests to verify the interaction between mAP and mAR calculations.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 770c7d6 and 2214753.

📒 Files selected for processing (1)
  • fiftyone/utils/eval/coco.py (12 hunks)
🔇 Additional comments (3)
fiftyone/utils/eval/coco.py (3)

301-303: LGTM: Proper handling of optional recall_sweep

The implementation correctly handles the optional recall_sweep parameter and safely computes _classwise_AR only when recall_sweep is available.


452-452: LGTM: Proper serialization support

The recall_sweep field is correctly handled in the serialization process with appropriate default values.

Also applies to: 462-462


388-416: Verify COCO evaluation standards compliance

Please verify that the mAR implementation follows the COCO evaluation standards, particularly regarding:

  1. The handling of crowd annotations
  2. The computation across IoU thresholds
✅ Verification successful

Based on the search results, I can now verify the COCO evaluation standards compliance. The code shows that:

  1. The implementation follows COCO standards for IoU thresholds:

    • Uses multiple IoU thresholds from 0.50 to 0.95 in steps of 0.05 ([x / 100 for x in range(50, 100, 5)])
    • This matches the official COCO evaluation protocol
  2. For mAR computation:

    • The code properly handles classwise AR computation through _classwise_AR
    • The implementation integrates with the broader COCO evaluation framework

However, let's verify one more aspect of crowd handling:


COCO evaluation standards compliance verified

The implementation correctly follows COCO evaluation standards:

  1. IoU thresholds handling:

    • Uses standard COCO thresholds (0.50:0.95:0.05) for mAP computation
    • Properly integrates thresholds into PR curve computation
  2. Crowd annotations handling:

    • Correctly implements iscrowd attribute handling through get_attribute_value
    • Follows COCO protocol where:
      • Only crowd ground truths can have multiple matches
      • Crowd matches require same class label
      • Non-crowd matches take precedence over crowd matches
  3. mAR computation:

    • Properly integrates with the COCO evaluation framework
    • Handles classwise AR computation correctly
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for COCO evaluation standard compliance

# Look for crowd handling in evaluation
rg -A 5 "iscrowd.*true" 

# Check IoU threshold handling
ast-grep --pattern 'def _compute_pr_curves($$$) {
  $$$
  iou_threshs = $$$
  $$$
}'

Length of output: 112


Script:

#!/bin/bash
# Let's try a broader search for COCO evaluation related code

# Search for IoU threshold definitions and computations
rg -A 10 "iou.*thresh" 

# Look for crowd handling in a broader context
rg -A 5 "crowd" 

# Search for AR computation logic
ast-grep --pattern 'def _compute_ar($$$) {
  $$$
}'

# Check the base evaluation class
rg -A 10 "class COCOEvaluation"

# Look for PR curve computation
rg -A 10 "_compute_pr"

Length of output: 66948


Script:

#!/bin/bash
# Let's search for COCO evaluation implementation details

# Look for AR computation
rg -A 10 "_compute_ar" fiftyone/utils/eval/coco.py

# Look for PR curve computation
rg -A 10 "_compute_pr_curves" fiftyone/utils/eval/coco.py

# Check for IoU threshold handling in COCO evaluation
rg -A 5 "iou_threshs.*None" fiftyone/utils/eval/coco.py

# Look for crowd handling in COCO evaluation
rg -A 10 "iscrowd.*true|iscrowd.*false" fiftyone/utils/eval/coco.py

Length of output: 1647


Script:

#!/bin/bash
# Search for crowd handling in evaluation matches
rg -A 10 "def _compute_matches.*crowd" fiftyone/utils/eval/coco.py

# Look for crowd attribute handling
rg -A 5 "iscrowd.*=" fiftyone/utils/eval/coco.py

# Check for crowd-related documentation
rg -A 5 "crowd.*ground truth" fiftyone/utils/eval/coco.py

Length of output: 1850

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
fiftyone/utils/eval/coco.py (3)

54-54: Update docstring to clarify mAR computation

The docstring for compute_mAP parameter should be updated to explicitly mention that it enables both mAP and mAR computations.

-            that mAP, mAR, and PR curves can be generated
+            that mean Average Precision (mAP), mean Average Recall (mAR), and 
+            Precision-Recall (PR) curves can be generated

259-260: Add type hints to docstring

Consider adding type hints to clarify the expected shape of the recall_sweep array.

-        recall_sweep (None): an array of recall values of shape
-            ``num_iou x num_classes``
+        recall_sweep (None): a numpy.ndarray of recall values with shape
+            ``(num_iou, num_classes)`` or None

389-417: Add input validation for classes parameter

The implementation is correct, but consider adding validation for the classes parameter to ensure it's not empty when provided.

     def mAR(self, classes=None):
         if self._classwise_AR is None:
             raise Exception(
                 "Classwise AR is not available. mAR can't be computed."
             )
 
         if classes is not None:
+            if not classes:
+                raise ValueError("The classes list cannot be empty")
             class_inds = np.array([self._get_class_index(c) for c in classes])
             classwise_AR = self._classwise_AR[class_inds]
         else:
             classwise_AR = self._classwise_AR
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2214753 and e94c8a0.

📒 Files selected for processing (1)
  • fiftyone/utils/eval/coco.py (12 hunks)
🔇 Additional comments (2)
fiftyone/utils/eval/coco.py (2)

222-222: LGTM!

The addition of recall_sweep to the return values and constructor call is implemented correctly and maintains backward compatibility.

Also applies to: 236-236


389-417: Verify mAR integration with evaluation panel

Ensure that the mAR computation is properly integrated with the evaluation panel UI components.

✅ Verification successful

mAR implementation is properly integrated with evaluation components

The verification shows that the mAR computation is correctly integrated:

  • The model evaluation panel (panels/model_evaluation/__init__.py) retrieves and displays mAR metrics
  • The UI component (NativeModelEvaluationView/Evaluation.tsx) properly renders mAR for object detection evaluations
  • The implementation follows the established evaluation metrics pattern in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for mAR integration in evaluation panel components

# Check for mAR-related UI components
rg -l "mAR" --type ts --type tsx

# Check for mAR-related Python imports and usage
rg -l "mAR" --type python

Length of output: 117


Script:

#!/bin/bash
# Search for mAR-related UI components and Python files using ripgrep
rg -l "mAR"

# Search for evaluation panel related files
rg -l "evaluation.*panel|eval.*panel"

# Search for COCO evaluation imports and usage
rg -l "COCOEvaluation|coco.*eval"

Length of output: 1523


Script:

#!/bin/bash
# Check how mAR is used in the evaluation panel
rg -A 5 "mAR" "fiftyone/operators/builtins/panels/model_evaluation/__init__.py"

# Check the model evaluation view implementation
rg -A 5 "mAR" "app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx"

# Check how COCOEvaluation is integrated with mAR
rg -A 5 "COCOEvaluation.*mAR" "fiftyone/utils/eval/coco.py"

Length of output: 920

@@ -713,6 +755,7 @@ def _compute_pr_curves(samples, config, classes=None, progress=None):
precision = -np.ones((num_threshs, num_classes, 101))
thresholds = -np.ones((num_threshs, num_classes, 101))
recall = np.linspace(0, 1, 101)
recall_sweep = -np.ones((num_threshs, num_classes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling in recall computation

The bare except blocks should be replaced with specific exception handling to avoid masking unexpected errors.

-            try:
-                for ri, pi in enumerate(inds):
-                    q[ri] = pre[pi]
-                    t[ri] = confs[pi]
-            except:
-                pass
+            try:
+                for ri, pi in enumerate(inds):
+                    q[ri] = pre[pi]
+                    t[ri] = confs[pi]
+            except IndexError as e:
+                logger.warning(f"Error computing precision values: {e}")

Also applies to: 806-806, 808-808

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manushreegangwar manushreegangwar merged commit d564f16 into develop Dec 20, 2024
14 checks passed
@manushreegangwar manushreegangwar deleted the manushree/mAR branch December 20, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants