[MRG] Correct pointer overflow in EMD #381
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and context / Related issue
When doing a transport plan with EMD between two 50k points clouds with uniform probability, the transport matrix would be a sparse matrix with only 42950 non-zero values. When solving the assignment problem between two pictures (as in https://github.com/ncassereau-idris/pictures-optimal-transport), this meant that some points would evade to the origin rather than their respective target points. This is unexpected, as the correct transport matrix should be a sparse matrix with 50k nonzero values.
What happened then ?
When computing transport plan with EMD, some pointers remained as int. This meant that if your cost matrix is over 46k lines and columns, pointers would overflow and the cost matrix would not be correctly converted.
How has this been tested (if it applies)
After modifying the openmp version, I tried to make a transport between two clouds of 50k points, and successfully obtained a sparse matrix with 50k nonzero values. I will try to use it on the picture assignment problem as well as execute unit tests to make sure the given matrix makes sense.
A picture assignment problem was solved with 60k points and produced a sound gif, therefore the fix is working correctly. At 70k and above, there is a risk of infeasible problem.
PR checklist