Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Types of changes
Motivation and context / Related issue
Cupy backend
How has this been tested (if it applies)
Tested with cupy 9.0.0
Performance
Make pytest will show that cupy's tests are very slow. This is my outlook on that:
For gromov, generator.choice is used in numerous functions but for cupy it requires sending the p argument back on the CPU. These recurrent operations are limited by memory bandwidth and penalises GPUs pretty harshly.
For other tests, GPUs have an advantage for bigger calculation and tests are usually done with tiny problems, for which CPUs might be faster (no overhead because no transfer towards and from GPUs and very quick memory).
To be fair, I tried to resolve different-sized problems on sinkhorn (using pretty much the code from
test/test_gpu.py::test_gpu_sinkhorn). Here are the results (averaged on 500 runs) :
For some reason, CupyBackend is even faster than ot.gpu for bigger problems (maybe slightly different algorithm ???). For smaller ones, it might be due to the overhead added by the Backend classes.
Checklist