Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Image classfication example has wrong accuracy metric. #11480

Closed
hxhxhx88 opened this issue Jun 29, 2018 · 6 comments
Closed

Image classfication example has wrong accuracy metric. #11480

hxhxhx88 opened this issue Jun 29, 2018 · 6 comments

Comments

@hxhxhx88
Copy link

In file example/image-classification/common/fit.py at line 295, the mx.callback.Speedometer is initialized with auto_reset unset, which will be its default value, i.e. True. This will make the logged epoch accuracy incorrect, since the module and the speedometer share the same metric.

Following is the current log for 1 epoch:

INFO:root:Epoch[1] Batch [20]	Speed: 1375.76 samples/sec	accuracy=0.519345
INFO:root:Epoch[1] Batch [40]	Speed: 1374.71 samples/sec	accuracy=0.515625
INFO:root:Epoch[1] Batch [60]	Speed: 1371.57 samples/sec	accuracy=0.521094
INFO:root:Epoch[1] Batch [80]	Speed: 1376.55 samples/sec	accuracy=0.533203
INFO:root:Epoch[1] Batch [100]	Speed: 1375.78 samples/sec	accuracy=0.548828
INFO:root:Epoch[1] Batch [120]	Speed: 1370.05 samples/sec	accuracy=0.544141
INFO:root:Epoch[1] Batch [140]	Speed: 1375.61 samples/sec	accuracy=0.566797
INFO:root:Epoch[1] Batch [160]	Speed: 1372.94 samples/sec	accuracy=0.571094
INFO:root:Epoch[1] Batch [180]	Speed: 1328.99 samples/sec	accuracy=0.566016
INFO:root:Epoch[1] Batch [200]	Speed: 1319.79 samples/sec	accuracy=0.576562
INFO:root:Epoch[1] Batch [220]	Speed: 1322.05 samples/sec	accuracy=0.576172
INFO:root:Epoch[1] Batch [240]	Speed: 1318.98 samples/sec	accuracy=0.597656
INFO:root:Epoch[1] Batch [260]	Speed: 1323.49 samples/sec	accuracy=0.593359
INFO:root:Epoch[1] Batch [280]	Speed: 1301.38 samples/sec	accuracy=0.611719
INFO:root:Epoch[1] Batch [300]	Speed: 1299.34 samples/sec	accuracy=0.615234
INFO:root:Epoch[1] Batch [320]	Speed: 1300.00 samples/sec	accuracy=0.622656
INFO:root:Epoch[1] Batch [340]	Speed: 1302.01 samples/sec	accuracy=0.641406
INFO:root:Epoch[1] Batch [360]	Speed: 1300.59 samples/sec	accuracy=0.632812
INFO:root:Epoch[1] Batch [380]	Speed: 1301.89 samples/sec	accuracy=0.623047
INFO:root:Epoch[1] Train-accuracy=0.642969

Following is the log with auto_reset set to False:

INFO:root:Epoch[1] Batch [20]	Speed: 1366.77 samples/sec	accuracy=0.518601
INFO:root:Epoch[1] Batch [40]	Speed: 1373.52 samples/sec	accuracy=0.518293
INFO:root:Epoch[1] Batch [60]	Speed: 1369.00 samples/sec	accuracy=0.518315
INFO:root:Epoch[1] Batch [80]	Speed: 1372.24 samples/sec	accuracy=0.521991
INFO:root:Epoch[1] Batch [100]	Speed: 1377.56 samples/sec	accuracy=0.526609
INFO:root:Epoch[1] Batch [120]	Speed: 1373.20 samples/sec	accuracy=0.530475
INFO:root:Epoch[1] Batch [140]	Speed: 1371.03 samples/sec	accuracy=0.536070
INFO:root:Epoch[1] Batch [160]	Speed: 1367.66 samples/sec	accuracy=0.541295
INFO:root:Epoch[1] Batch [180]	Speed: 1371.29 samples/sec	accuracy=0.543854
INFO:root:Epoch[1] Batch [200]	Speed: 1364.62 samples/sec	accuracy=0.547069
INFO:root:Epoch[1] Batch [220]	Speed: 1370.96 samples/sec	accuracy=0.549456
INFO:root:Epoch[1] Batch [240]	Speed: 1363.94 samples/sec	accuracy=0.553488
INFO:root:Epoch[1] Batch [260]	Speed: 1371.96 samples/sec	accuracy=0.556454
INFO:root:Epoch[1] Batch [280]	Speed: 1368.86 samples/sec	accuracy=0.560582
INFO:root:Epoch[1] Batch [300]	Speed: 1360.55 samples/sec	accuracy=0.564654
INFO:root:Epoch[1] Batch [320]	Speed: 1366.20 samples/sec	accuracy=0.567465
INFO:root:Epoch[1] Batch [340]	Speed: 1366.99 samples/sec	accuracy=0.571527
INFO:root:Epoch[1] Batch [360]	Speed: 1367.97 samples/sec	accuracy=0.575160
INFO:root:Epoch[1] Batch [380]	Speed: 1367.14 samples/sec	accuracy=0.578043
INFO:root:Epoch[1] Train-accuracy=0.579803

Note that when auto_reset=True, the batch-wise accuracy is correct, but epoch-wise is wrong. When auto_reset=False, the batch-wise one is incorrect, but epoch-wise is correct.

I believe there are two fixes:

  1. Manually set auto_reset=False to log accumulated accuracy for each batch.
  2. Set an independent metric for the speedometer.
@frankfliu
Copy link
Contributor

Hi @hxhxhx88 , thanks for submitting issue. @sandeep-krishnamurthy requesting this be labeled.

@vandanavk
Copy link
Contributor

it seems that this error was predicted, based on review comments on the PR #5827.
Checking what can be done differently to fix this issue.

@vandanavk
Copy link
Contributor

@hxhxhx88 Upon further investigation, it was found that this observation is expected behavior.

"INFO:root:Epoch[1] Train-accuracy=" is not the epoch accuracy - the log is misleading (Ref: #10437). The plan is to remove this print statement altogether.

The log for batch is based on a user-specified value which prints a log at regular intervals (--disp-batches in fit.py).

@vandanavk
Copy link
Contributor

PR #12182

@vandanavk
Copy link
Contributor

A separate metric is being maintained for epoch metric vs batch-wise metric. @hxhxhx88 Please have a look at PR #12182. It would be great if you could verify it at your end as well.

@vandanavk
Copy link
Contributor

The PR has been merged. Can this issue be closed now?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants