Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Model Evaluation panel: load_evaluation runtime error #5254

Closed
1 of 3 tasks
deltheil opened this issue Dec 10, 2024 · 4 comments
Closed
1 of 3 tasks

[BUG] Model Evaluation panel: load_evaluation runtime error #5254

deltheil opened this issue Dec 10, 2024 · 4 comments
Labels
bug Bug fixes

Comments

@deltheil
Copy link

Describe the problem

When using the new model evaluation panel from FiftyOne v1.1.0 on the vanilla tutorial, things work like a charm.

But, it fails with an unsupported operand type(s) for +: 'int' and 'NoneType' runtime error when evaluating detections via this slightly more elaborated example.

Code to reproduce issue

The vanilla example works fine:

import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart")  # detections pre-computed
results = dataset.evaluate_detections(
    "predictions",
    gt_field="ground_truth",
    eval_key="eval",
)

But, this example fails:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

model = foz.load_zoo_model('faster-rcnn-resnet50-fpn-coco-torch')
dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    dataset_name="evaluate-detections-tutorial",
)
dataset.persistent = True

predictions_view = dataset.take(100, seed=51)
predictions_view.apply_model(model, label_field="faster_rcnn")

high_conf_view = predictions_view.filter_labels("faster_rcnn", F("confidence") > 0.75, only_matches=False)
dataset.save_view("high-confidence", high_conf_view)
results = high_conf_view.evaluate_detections(
    "faster_rcnn",
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
)

Then open the app > Model evaluation tab > Select the "eval" evaluation:

Screenshot 2024-12-10 at 19 10 53

With the following stack trace in the console:

File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/operators/executor.py", line 297, in execute_or_delegate_operator
    result = await do_execute_operator(operator, ctx, exhaust=exhaust)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/operators/executor.py", line 341, in do_execute_operator
    result = await (
             ^^^^^^^
  File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/core/utils.py", line 2355, in run_sync_task
    return await loop.run_in_executor(_get_sync_task_executor(), func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cedric/.rye/py/[email protected]/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/operators/panel.py", line 165, in execute
    method(ctx)
  File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/operators/builtins/panels/model_evaluation/__init__.py", line 300, in load_evaluation
    metrics["tp"], metrics["fp"], metrics["fn"] = self.get_tp_fp_fn(
                                                  ^^^^^^^^^^^^^^^^^^
  File "/home/foo/bar/.venv/lib/python3.12/site-packages/fiftyone/operators/builtins/panels/model_evaluation/__init__.py", line 115, in get_tp_fp_fn
    sum(ctx.dataset.values(tp_key))
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

This is happening during the call to the "@voxel51/operators/model_evaluation_panel_builtin" operator with the "load_evaluation " method.

Given the line which failed, I checked that all corresponding values are not None:

>>> high_conf_view.values("eval_tp")
[2, 1, 2, 5, 1, 6, 3, 3, 0, 3, 0, 4, 17, 0, 0, 0, 8, 19, 5, 2, 13, 15, 8, 3, 3, 2, 7, 1, 2, 1, 2, 2, 1, 11, 5, 2, 8, 3, 1, 1, 3, 1
, 4, 6, 3, 4, 4, 1, 7, 1, 1, 11, 15, 2, 3, 5, 7, 1, 6, 2, 17, 2, 1, 9, 3, 1, 5, 2, 1, 4, 4, 5, 1, 1, 1, 2, 3, 1, 1, 4, 3, 1, 1, 20
, 3, 3, 1, 2, 2, 3, 2, 2, 1, 5, 3, 1, 2, 2, 4, 2]
>>> high_conf_view.values("eval_fp")
[0, 2, 0, 2, 0, 3, 1, 0, 1, 2, 0, 0, 3, 0, 2, 1, 3, 3, 0, 0, 0, 0, 4, 0, 0, 0, 1, 4, 1, 0, 0, 1, 1, 1, 0, 0, 0, 2, 1, 1, 4, 0, 0,
1, 1, 0, 4, 7, 6, 1, 0, 1, 3, 0, 0, 1, 2, 1, 4, 0, 2, 2, 0, 0, 0, 2, 3, 6, 1, 0, 2, 2, 0, 1, 0, 4, 0, 1, 1, 1, 0, 0, 0, 1, 1, 3, 2
, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 4, 0]
>>> high_conf_view.values("eval_fn")
[0, 10, 0, 1, 0, 15, 1, 0, 2, 4, 1, 0, 6, 2, 11, 2, 6, 15, 2, 0, 4, 23, 8, 0, 1, 0, 0, 3, 0, 0, 2, 0, 3, 9, 1, 0, 1, 1, 1, 1, 6, 0
, 2, 3, 2, 0, 2, 5, 8, 0, 0, 11, 11, 0, 0, 0, 4, 1, 1, 0, 4, 13, 0, 2, 3, 1, 2, 7, 0, 0, 0, 3, 0, 0, 0, 8, 0, 5, 0, 0, 1, 0, 1, 26
, 6, 1, 0, 1, 1, 1, 1, 0, 1, 6, 0, 1, 0, 0, 6, 1]

(since this looks like a sum([1, None, 3]) kind of problem)

=> Is it a bug or misuse? If so, please let me know what I am missing here.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Ubuntu 24.04 LTS
  • Python version (python --version): Python 3.12.7
  • FiftyOne version (fiftyone --version): FiftyOne v1.1.0, Voxel51, Inc.
  • FiftyOne installed from (pip or source): pip (via rye)

Other info/logs

One difference with the vanilla tutorial is that the evaluation is computed from a view. (FWIW, evaluate_classifications seems to work fine with views).

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@deltheil deltheil added the bug Bug fixes label Dec 10, 2024
@deltheil
Copy link
Author

Having a look at:

tp_key = f"{key}_tp"
fp_key = f"{key}_fp"
fn_key = f"{key}_fn"
tp_total = (
sum(ctx.dataset.values(tp_key))
if dataset.has_field(tp_key)
else None
)
fp_total = (
sum(ctx.dataset.values(fp_key))
if dataset.has_field(fp_key)
else None
)
fn_total = (
sum(ctx.dataset.values(fn_key))
if dataset.has_field(fn_key)
else None
)

Isn't it an issue to load the TP, FP and FN values (via ctx.dataset.values(tp_key), etc) over the whole dataset and not just the corresponding view on which the evaluation has been performed? This could explain the None-s values and why it works fine with the vanilla example.

(and these fields seems specific to the detections eval which could explain why classification works fine - i.e. dataset.has_field(tp_key) is False, etc)

@deltheil
Copy link
Author

deltheil commented Dec 10, 2024

Not sure it's correct, but a patch could be:

diff --git a/fiftyone/operators/builtins/panels/model_evaluation/__init__.py b/fiftyone/operators/builtins/panels/model_evaluation/__init__.py
index cb33082f9..fa638198d 100644
--- a/fiftyone/operators/builtins/panels/model_evaluation/__init__.py
+++ b/fiftyone/operators/builtins/panels/model_evaluation/__init__.py
@@ -104,7 +104,7 @@ class EvaluationPanel(Panel):
                 total += metrics["confidence"]
         return total / count if count > 0 else None
 
-    def get_tp_fp_fn(self, ctx):
+    def get_tp_fp_fn(self, ctx, results):
         view_state = ctx.panel.get_state("view") or {}
         key = view_state.get("key")
         dataset = ctx.dataset
@@ -112,17 +112,17 @@ class EvaluationPanel(Panel):
         fp_key = f"{key}_fp"
         fn_key = f"{key}_fn"
         tp_total = (
-            sum(ctx.dataset.values(tp_key))
+            sum(results.samples.values(tp_key))
             if dataset.has_field(tp_key)
             else None
         )
         fp_total = (
-            sum(ctx.dataset.values(fp_key))
+            sum(results.samples.values(fp_key))
             if dataset.has_field(fp_key)
             else None
         )
         fn_total = (
-            sum(ctx.dataset.values(fn_key))
+            sum(results.samples.values(fn_key))
             if dataset.has_field(fn_key)
             else None
         )
@@ -298,7 +298,7 @@ class EvaluationPanel(Panel):
                 per_class_metrics
             )
             metrics["tp"], metrics["fp"], metrics["fn"] = self.get_tp_fp_fn(
-                ctx
+                ctx, results
             )
             metrics["mAP"] = self.get_map(results)
             evaluation_data = {

cc @imanjra

@brimoor brimoor mentioned this issue Dec 13, 2024
@brimoor
Copy link
Contributor

brimoor commented Dec 13, 2024

@deltheil thanks for the detailed bug report! This is fixed by #5267 and will be available in fiftyone==1.2.0 which we plan to publish next week 🎉

brimoor added a commit that referenced this issue Dec 13, 2024
@daniel-bogdoll
Copy link

@brimoor I don't find the time for a more detailed report so close to Christmas, but I have other (maybe related?) issues with the Model Eval Panel - for me, I saw two issues:

  • FP/TP/... values sometimes did not reflect the actual values I extracted from the fields programmatically. They especially repeated across fields with similar names (like ...dino_tiny... and ...dino_base...), which caught my attention in the first place, just seemed unlikely.
  • The panel did sometimes not update the eval fields, even old ones I deleted still showed. Restarting the session did not help and I did not know how to 'reload/restart' the panel

Generally speaking great feature though! Love the 'generate view with 1 click' feature. If 1.2.0 won't fix it, I might write a proper report in January. Keep up the good work! And looking forward towards comparing more than 2 models :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

No branches or pull requests

3 participants