Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not in simplex -- two sets of largely different sizes #93

Closed
caricesarotti opened this issue Jul 19, 2019 · 25 comments · Fixed by #217
Closed

Not in simplex -- two sets of largely different sizes #93

caricesarotti opened this issue Jul 19, 2019 · 25 comments · Fixed by #217
Assignees

Comments

@caricesarotti
Copy link

caricesarotti commented Jul 19, 2019

I am trying to calculate the EMD of two sets. When one set has a few hundred entries and the other has only 2, the EMD calculation fails and returns Problem Infeasible.

Steps to reproduce the behavior:
** SEE BELOW COMMENT FOR FIXED SCRIPT **

Expected behavior
Should return EMD around 1, instead says that the sets spherEng1 and pencilEnergy are not in the simplex

Screenshots
Here is comparing the EMDs calculated for less densely tiled to most densely tiled (number of particles = number of segments) with the two element set
image

Desktop (please complete the following information):

  • OS: [MacOSX]
  • Python version [3.6]
  • POT installed with pip

import platform; print(platform.platform())
Darwin-16.7.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.15.4')
import scipy; print("SciPy", scipy.version)
('SciPy', '1.1.0')
import ot; print("POT", ot.version)
('POT', '0.5.1')

@ncourty
Copy link
Collaborator

ncourty commented Jul 19, 2019

hi, spherEng1 and PencilEnergy should sum up to the same value (the total mass in both distributions should be the same). Is it the case ?

@caricesarotti
Copy link
Author

Yes, I've normed them both so that they both sum to 1

@hichamjanati
Copy link
Contributor

hichamjanati commented Jul 22, 2019

Hi @caricesarotti

I tried to run your code, but there are many errors in your script. nside = 2i? is it 2 * i ?
What is nside2 ?

Even with by changing 2i -> 2 * i and nside2 -> nside. I get:

  File "pot_#93.py", line 16, in <module>
    engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
NameError: name 'eng' is not defined

Please make sure your script runs so I can reproduce the error.
And make sure to put your code in code blocks so it renders properly and we can see
for loops indentations. See https://help.github.com/en/articles/creating-and-highlighting-code-blocks#syntax-highlighting.

@caricesarotti
Copy link
Author

Whoops, sorry about that. Some asterisks got lost

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py): 
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()
    
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)

@hichamjanati
Copy link
Contributor

Thank you @caricesarotti
Adding a print(emdval) at the end of your script I get:

(base) 223-202:ot hichamjanati$ python pot_#93.py 
0.999838453061382

Are you sure this script above gives you a not in simplex error ?

@caricesarotti
Copy link
Author

That is very interesting. When I run that script in Jupyter with a python 3 compiler, I get:

image

Are our versions of the libraries I am using the same?

@hichamjanati
Copy link
Contributor

I could not getastropy_healpix to work with your version (1.15.4) of numpy.
Since your script works with the latest pip version of astropy_healpix and numpy, try updating both of them and see if works.

Also note that you mentioned using Python 3.6 but your output is Python 2.7 (Your script still works with both python 2.7 and 3.6 with the latest numpy, pot and astropy).

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2
import sys

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py):
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()

    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)
print("emdval = %f" % emdval)

versions = [sys.version, np.__version__, ot.__version__,
            astropy_healpix.__version__]
names = ["Python", "Numpy", "POT", "Astropy"]
for name, version in zip(names, versions):
    print(name, version)

with python 2.7

emdval = 0.999838
('Python', '2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 16:24:02) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
('Numpy', '1.16.4')
('POT', '0.5.1')
('Astropy', u'0.4')

with python 3.6

(pot93) Hichams-MacBook-Pro:ot hichamjanati$ python pot_#93.py 
emdval = 0.999838
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Numpy 1.16.4
POT 0.5.1
Astropy 0.4

@caricesarotti
Copy link
Author

I am still seeing some problems.

image

Any ideas? I am really at a loss

@hichamjanati
Copy link
Contributor

I don't know what is broken on your end. You can create a small env, this bit works just fine:
conda create -n potenv python==3.6.5
conda activate potenv
pip install pot astropy_healpix
python pot_93.py

emdval = 0.999838
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Numpy 1.16.4
POT 0.5.1
Astropy 0.4

your pot_93.py file:

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2
import sys

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py):
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)
print("emdval = %f" % emdval)

versions = [sys.version, np.__version__, ot.__version__,
            astropy_healpix.__version__]
names = ["Python", "Numpy", "POT", "Astropy"]
for name, version in zip(names, versions):
    print(name, version)

@caricesarotti
Copy link
Author

I ran your code exactly and I am still getting the error

image

@hichamjanati
Copy link
Contributor

Ok. Could you add just before the emd2 call, inside emd_Calc:

    print(ev0norm.min(), ev0norm.sum())
    print(ev1norm.min(), ev1norm.sum())
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)

and report the output here

@caricesarotti
Copy link
Author

No Problem
image

@hichamjanati
Copy link
Contributor

hichamjanati commented Jul 23, 2019

I have the same output, 0.003 so it's not a matter of numerical precision. I'm running out of options without a way to reproduce the error ...
What happens when you change the values of this distributions ? what if you make both of them uniform dists ?
Can you add the versions of cython and scipy at the end of the script ?

@caricesarotti
Copy link
Author

When I consider the versions of cython and scipy, I get

SciPy 1.3.0
Cython 0.29.12

I think I have found the issue though. I talked to some collaborators who have used POT for similar things (https://arxiv.org/abs/1902.02346) and they said they countered the same problem.

The issue is with the Mac OS. There is something strange about running the POT on the Darwin platform. I am resolving the problem by switching to a linux machine

@hichamjanati
Copy link
Contributor

Ok,
It would be great if they could create an issue so we can have a look at it.
Actually I ran the script above on a Mac OS so there should not be any problem with Darwin systems...

@rflamary
Copy link
Collaborator

yes I agree with @hichamjanati,

if this is a bug that affect several person, we should hunt it down or at least leave this issue open. Switching to linux is not a good answer...

@caricesarotti
Copy link
Author

I agree that it would be ideal to run the script successfully on my OS, but I am running out of ideas on how to help.

I found this piece of code in the POT library:

image

My OS is Darwin 16, not sure if there is an issue here

@aforr
Copy link

aforr commented Jul 16, 2020

I've encountered what appears to be the same problem. Like caricesarotti, I only see the problem on a mac, not on linux.

The minimal example I've found is this:

import numpy as np
import ot
ot.emd2([], [], np.ones([292, 2]))

On linux, that outputs 1.0000000000000004, as expected. On my mac, it outputs

[...]/ot/lp/__init__.py:421: UserWarning: Problem infeasible. Check that a and b are in the simplex
  check_result(result_code)
0.0

It seems to be related to the shape of the input, but I haven't figured out how. Some other examples: using np.ones([293, 2]) or np.ones([292, 4]) works fine, while np.ones([313, 3]) fails.

OS and package versions where I do see the problem:

>>> import platform; print(platform.platform())
macOS-10.14.6-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:21:09) 
[Clang 9.0.1 ]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.18.5
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.5.0
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

OS and package versions where I do not see the problem:

>>> import platform; print(platform.platform())
Linux-4.4.0-165-generic-x86_64-with-glibc2.10
>>> import sys; print("Python", sys.version)
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:43:00) 
[GCC 7.5.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.19.0
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.5.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

I'm planning to follow caricesarotti's suggestion and switch to linux.

@merplumander
Copy link

Having the same problem and I'm pretty much stuck on Mac.

I can reproduce aforr's minimal example (also same behaviour for the other shapes he tried).

>>> import platform; print(platform.platform())
Darwin-19.3.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.5 (default, Jun 17 2018, 12:13:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.19.4
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.4.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

I appreciate all the efforts for a resolution :)

@ncourty
Copy link
Collaborator

ncourty commented Nov 11, 2020

Yes, I can reproduce the issue. A very quick fix-up:
ot.emd2([], [], np.ones([292, 2],dtype=np.float32))
or
ot.emd2([], [], np.ones([292, 2])*1.)
would work. I guess the problem comes from casting the integer cost into 32-bits float in the C code that might cause this rounding issue, that triggers the marginal errors. I will investigate.

@merplumander
Copy link

merplumander commented Nov 11, 2020

Thanks for the work you are putting into this!

Unfortunately, both of the quick-fix suggestions don't work for me. They still produce the warning and return 0.0 ...
And I mean not just in my use case, but the literal code that you posted.

@ncourty
Copy link
Collaborator

ncourty commented Nov 11, 2020

Damn I was too quick in my answer. This is indeed a very weird bug... Looking into it, I will keep you posted

@ncourty
Copy link
Collaborator

ncourty commented Nov 11, 2020

Ok I noticed a small problem with an EPSILON constant in the C code. Could you please install the branch 'precision_concern' and test it on your problem ? The modifications I performed did actually solved the issue for the previous problem (with the [292,2] array).

@merplumander
Copy link

That solves both the toy test case and my actual use case and there is no need to do any casting (it works with and without casting)!

Thanks so much for looking into this and being so super responsive! :)

@ncourty
Copy link
Collaborator

ncourty commented Nov 12, 2020

Great ! Thanks for the quick feedback. I will do a PR and close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants