Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flip augmentation leads to bad PoseNet #347

Open
Beniko95J opened this issue May 20, 2021 · 4 comments
Open

Flip augmentation leads to bad PoseNet #347

Beniko95J opened this issue May 20, 2021 · 4 comments

Comments

@Beniko95J
Copy link

Hi, I trained the Monodepth2 on KITTI Odometry Dataset twice, the first one with flip augmentation enabled and the second one with flip augmentation disabled, and got very different trajectories on test sequences Seq.09 and Seq.10. The model with flip augmentation disabled gives a much better global trajectory than the model with flip augmentation enabled.

with flip augmentation:
sequence_09.pdf
without flip augmentation:
sequence_09.pdf

I am wondering why this happens. I think the PoseNet should also perform better with flip augmentation enabled, which is the case for DepthNet, since more data is leveraged. Has anyone met this before or does anyone have ideas about this?

All the best.

@mdfirman
Copy link
Collaborator

This is really interesting, thanks for sharing.

Of course it's possible there's a bug in our code around flip augmentation.

But assuming there isn't: perhaps this could be to do with the strong priors around car motion – the KITTI sequences are filmed in Germany, where cars drive on the right. By training with flip augmentation on, then the pose network has to learn both 'drive on left' world and 'drive on right' versions of the world – whereas by turning it off then the pose network only ever has to learn the 'correct' drive-on-right scenario. (Similar to ideas expressed in the visual chirality paper – https://linzhiqiu.github.io/papers/chirality/)

Just a thought.

I don't suppose you also tried training a depth model without flip aug? Or got pose benchmark numbers from your pose model? Both of those results might be insightful!

Thanks,

Michael

@Beniko95J
Copy link
Author

Thanks for the reply! @mrharicot

I think your idea and the idea expressed in Visual Chirality do give some insights on this problem.

I don't suppose you also tried training a depth model without flip aug?

Yeah, I trained DepthNet along with PoseNet so I also have DepthNet trained with flip augmentation enabled and flip augmentation disabled. There is a slight drop of performance of depth model without flip aug, which is contrast to PoseNet.

In addition, I also trained DepthNet and PoseNet using the flipped version of training sequences (training data contains flipped images only) to see what will happen. As a result, the performance of DepthNet drops from abs=0.121 to abs=0.138, while PoseNet gives totally wrong global trajectories on unflipped Seq.09 and Seq.10.

It seems like the affect of visual chirality also depends on the task, like it affects less on the task of depth estimation and affects more on the task of pose estimation.

Or got pose benchmark numbers from your pose model?

I evaluate PoseNet by getting the estimated global trajectory and using KITTI Odometry Benchmark which calculates translational and rotational errors for all possible subsequences of length. The PoseNet without flip aug gives like t_rel=5.7, r_rel=1.9, and PoseNet with flip aug gives t_rel=11.1, r_rel=4.8. As for the PoseNet trained by only flipped images, the numbers are even much more worse than PoseNet with flip aug.

Hoping we can have more discussions on this!

@mdfirman
Copy link
Collaborator

Hi @Beniko95J , thanks for reporting back and for your updated numbers.

This is very interesting.

I'm glad to see that depth estimation benefits from flip augmentation!

The results of your pose experiments really do suggest that flipped images hurt pose estimates. I feel I should watch the KITTI sequences again (in flipped and unflipped versions!) to see if there's anything obvious which would be hurt by learning on flipped ims.

I think there's still a small chance that we have a bug e.g. around use of intrinsics in flipped images. If so, this type of bug might affect pose estimation more than depth, hence the effect you're seeing here. (But I much prefer the visual chirality explanation!)

@alwynmathew
Copy link

@mdfirman Even though its mentioned not to horizontally flip the image if the principal point is far from the center. But is it advisable to update the principal point if the principal point is far from the center and image needed to be horizontally flip. cx and cy will be updated to image_width - cx and cy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants