Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy is much lower #2

Open
luhc15 opened this issue Dec 29, 2017 · 8 comments
Open

accuracy is much lower #2

luhc15 opened this issue Dec 29, 2017 · 8 comments

Comments

@luhc15
Copy link

luhc15 commented Dec 29, 2017

when I convert the voc101 model to pytorch version , I test on VOC2012 val.txt, but the mean IoU is 79.6% , much lower than the author given which is 85.41%, is there any other details I ignored

@zhijiew
Copy link

zhijiew commented Dec 30, 2017

Hi, @luhc15 , I'm also trying to use these codes in voc dataset, could you please share your test codes? Thanks a lot!

@kazuto1011
Copy link
Owner

Hi guys. I've never evaluated my converted model. One thing I found is that mean values in demo.py are incorrect; slightly different from the ones given by the authors. For precise evaluation, please refer to the original Matlab code in the link below and the Section 5.3 "PASCAL VOC 2012" in the paper. Have you already tried multi scale testing?
https://github.com/hszhao/PSPNet/blob/master/evaluation/eval_all.m#L59

@luhc15
Copy link
Author

luhc15 commented Jan 2, 2018

@luhc15
Copy link
Author

luhc15 commented Jan 2, 2018

@kazuto1011 I use the voc2007 val as my test dataset, I tried multiscale but got worse result, may be I should check the test details.

@zhijiew
Copy link

zhijiew commented Jan 24, 2018

@luhc15 @kazuto1011
I evaluated this model, it can reach 87.42 miou in val data(which contains 1449 images), and if I use mean rgb value as the origin paper author, it can reach 87.47 miou.
But when I use the origin model and evaulate, it can reach 91.67 miou on the same val dataset (I guess the author use all data including training set and validation set to train their model to get higher performance in online test).
But there is still difference in original caffe model (91.67) and pytorch model (87.47), when you transfer this model from caffe to pytorch, is there any layer you skipped? @kazuto1011

Looking forward to your reply!

@kazuto1011
Copy link
Owner

Thank you for reporting the results! I believe any layers are not skipped and instead suspect slight differences between the models like an interpolation way. Maybe we should compare the intermediate values of Caffe and PyTorch.
Anyway, I have also evaluated the converted model on val set, by averaging softmax results of multi-scaled and flipped 12 inputs, and it reached 86.9 mIoU % (close to yours?). I guess this result may be related to this issue. The author's codes contain "sliced evaluation" on scale_process.m, although I have no confidence that it is also for VOC2012. According to the issue, the score was risen by about 15% on cityscapes dataset against the case of rescaled inputs. Did you use the original MATLAB codes for 91.67%? How are your eval procedures for both?

@zhijiew
Copy link

zhijiew commented Jan 26, 2018

I use your codes generate gray images and use the original matlab scripts contained in caffe version to evaluate.

And I also tried test data on the pascal voc server, it reached 80.77 miou, which is much lower than the performance of pspnet on the leaderboard.

@wangbofei11
Copy link

Thanks for your excellent work.But I also found the accuracy problem. I test some realword images for the voc modle,the result of the converted pytorch modle is some little worse than the original caffe modle for almost all images。The test code and the input image resolution is same。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants