Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3DPW experimental result without training with 3dpw training data #13

Closed
mimiliaogo opened this issue Jan 7, 2023 · 3 comments
Closed

Comments

@mimiliaogo
Copy link

Hi, in some HMR papers, they released their 3dpw eval results without 3dpw training data. For a fair comparison, I tried your released checkpoint [FastMETRO-S-R50_h36m_state_dict.bin], evaluated it on 3dpw test, and got a MPVPE ~129. This result is actually worse than some papers that didn't train with 3dpw (3DCrowdnet, PyMAF...).
And then I finetune this on 3dpw training data, and got an MPVPE ~93. I found the result surprisingly got a larger gain compared to other papers after fine-tuning on 3dpw (ex. PyMAF).
Do you know the reason why your paper didn't get a quite well performance before fine-tuning on 3dpw but got a SOTA performance after fine-tuning?
Thank you!

@FastMETRO
Copy link
Collaborator

Hello,

As described in METRO(Section: H. Limitations), our FastMETRO also might not perform well if the target domain distribution is significantly different from the source domain distribution. The non-parametric methods (FastMETRO, METRO, Mesh Graphormer), which directly regress the 3D coordinates of mesh vertices by leveraging transformer architectures, might suffer more from the distribution shift issue compared with other parametric methods.

This might be the reason why our method (also METRO & Mesh Graphormer) significantly improves the accuracy after fine-tuning on the 3DPW dataset.

To alleviate this issue, one may leverage domain generalization methods which improve the robustness to the distribution shift.

Thanks for your interest in our work!!

@FastMETRO
Copy link
Collaborator

Please reopen this issue if you need more help regarding this.

@FastMETRO
Copy link
Collaborator

Hello @mimiliaogo,

We recently investigated the large performance gap before and after fine-tuning the model on the 3DPW dataset. Using the model before the fine-tuning on 3DPW FastMETRO-S-R50_h36m_state_dict.bin, we visualized our estimation results on 3DPW and compared them with ground-truth meshes.

Pred1

Pred2

Pred3

As shown in the above figure, we were surprised that most estimation results were quite accurate even before the fine-tuning on 3DPW. From this observation, we wondered why quantitative results showed the large performance gap. To figure out the reason, we visualized all estimation results on 3DPW and qualitatively evaluated them.

Fail_Case

As shown in the above figure, we observed an unusual model bias for outdoor images of a person’s back, when the model had not been fine-tuned on 3DPW. It seems that the weird model bias significantly affects quantitative results, although most estimation results are quite reasonable.

Fail

As shown in the above figure, the model bias results in largely inaccurate human pose and camera estimations for outdoor images of a person’s back. It seems that the weird 3D human mesh outputs are produced for the compensation of inaccurately estimated cameras. The fine-tuning on 3DPW helps the model to estimate more accurate cameras and 3D human mesh outputs accordingly. In other words, the fine-tuning significantly alleviates the unusual model bias for outdoor images of a person’s back, and it might be the reason for the large performance gap.

We suspect that this unusual model bias might be attributed to training the model with 2D annotation datasets (e.g., COCO), where the model was supervised only using the 2D joint reprojection loss. In this case, the supervision lacks the structure of 3D human body. Despite wrong 3D joints, their projected 2D joints could be accurate compared with GT.

Since the mixed training dataset (e.g., Human3.6M, UP3D) contains only a few outdoor images of a person’s back (compared with the number of images of a person’s front), the model seems to be wrongly biased due to the 2D joint reprojection loss. Most non-parametric approaches might suffer from the same issue if they do not fully utilize human pose and shape priors.

We hope our observations will facilitate future research! To alleviate this issue, non-parametric methods need to fully leverage 3D human body priors while being trained on 2D annotation datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants