Not in simplex -- two sets of largely different sizes #93

caricesarotti · 2019-07-19T17:51:26Z

I am trying to calculate the EMD of two sets. When one set has a few hundred entries and the other has only 2, the EMD calculation fails and returns Problem Infeasible.

Steps to reproduce the behavior:
** SEE BELOW COMMENT FOR FIXED SCRIPT **

Expected behavior
Should return EMD around 1, instead says that the sets spherEng1 and pencilEnergy are not in the simplex

Screenshots
Here is comparing the EMDs calculated for less densely tiled to most densely tiled (number of particles = number of segments) with the two element set

Desktop (please complete the following information):

OS: [MacOSX]
Python version [3.6]
POT installed with pip

import platform; print(platform.platform())
Darwin-16.7.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.15.4')
import scipy; print("SciPy", scipy.version)
('SciPy', '1.1.0')
import ot; print("POT", ot.version)
('POT', '0.5.1')

ncourty · 2019-07-19T23:33:58Z

hi, spherEng1 and PencilEnergy should sum up to the same value (the total mass in both distributions should be the same). Is it the case ?

caricesarotti · 2019-07-19T23:34:37Z

Yes, I've normed them both so that they both sum to 1

hichamjanati · 2019-07-22T16:28:48Z

Hi @caricesarotti

I tried to run your code, but there are many errors in your script. nside = 2i? is it 2 * i ?
What is nside2 ?

Even with by changing 2i -> 2 * i and nside2 -> nside. I get:

  File "pot_#93.py", line 16, in <module>
    engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
NameError: name 'eng' is not defined

Please make sure your script runs so I can reproduce the error.
And make sure to put your code in code blocks so it renders properly and we can see
for loops indentations. See https://help.github.com/en/articles/creating-and-highlighting-code-blocks#syntax-highlighting.

caricesarotti · 2019-07-22T17:13:37Z

Whoops, sorry about that. Some asterisks got lost

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py): 
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()
    
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)

hichamjanati · 2019-07-22T22:32:15Z

Thank you @caricesarotti
Adding a print(emdval) at the end of your script I get:

(base) 223-202:ot hichamjanati$ python pot_#93.py 
0.999838453061382

Are you sure this script above gives you a not in simplex error ?

caricesarotti · 2019-07-23T03:01:18Z

That is very interesting. When I run that script in Jupyter with a python 3 compiler, I get:

Are our versions of the libraries I am using the same?

hichamjanati · 2019-07-23T08:19:52Z

I could not getastropy_healpix to work with your version (1.15.4) of numpy.
Since your script works with the latest pip version of astropy_healpix and numpy, try updating both of them and see if works.

Also note that you mentioned using Python 3.6 but your output is Python 2.7 (Your script still works with both python 2.7 and 3.6 with the latest numpy, pot and astropy).

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2
import sys

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py):
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()

    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)
print("emdval = %f" % emdval)

versions = [sys.version, np.__version__, ot.__version__,
            astropy_healpix.__version__]
names = ["Python", "Numpy", "POT", "Astropy"]
for name, version in zip(names, versions):
    print(name, version)

with python 2.7

emdval = 0.999838
('Python', '2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 16:24:02) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
('Numpy', '1.16.4')
('POT', '0.5.1')
('Astropy', u'0.4')

with python 3.6

(pot93) Hichams-MacBook-Pro:ot hichamjanati$ python pot_#93.py 
emdval = 0.999838
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Numpy 1.16.4
POT 0.5.1
Astropy 0.4

caricesarotti · 2019-07-23T13:38:05Z

I am still seeing some problems.

Any ideas? I am really at a loss

hichamjanati · 2019-07-23T14:28:54Z

I don't know what is broken on your end. You can create a small env, this bit works just fine:
conda create -n potenv python==3.6.5
conda activate potenv
pip install pot astropy_healpix
python pot_93.py

emdval = 0.999838
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Numpy 1.16.4
POT 0.5.1
Astropy 0.4

your pot_93.py file:

import numpy as np
from numpy import linalg as LA
import astropy_healpix
import astropy_healpix.healpy as hp
import ot
from ot.lp import emd_c, check_result, emd2
import sys

# DEFINED FOR MASSLESS PARTICLES
def eng(px,py,pz):
    return np.sqrt(px**2+py**2+pz**2)

def pT(px, py):
    return np.sqrt(px**2+py**2)

def listNorm(a):
    return a[:]/a[:].sum()

def _cdist_sphere(X,Y):
    theta_d=np.array([[np.arccos(np.around(np.dot(X[i],Y[j]), decimals=5)/np.around((LA.norm(Y[j])*LA.norm(X[i])),decimals=5)) for j in range(len(Y))] for i in range(len(X))])
    return theta_d

def emd_Calc(ev0,ev1,M,maxIter=1000000):
    # NORMALIZE IF NOT NORMALIZED
    ev0norm = ev0[:]/ev0[:].sum()
    ev1norm = ev1[:]/ev1[:].sum()
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)
    #print(log['result_code'])
    if cost==0:
        print(log['warning'])
    return cost



# Generating the sphere tiling:
spherEventScan=[]
for i in range(5):
    nside=2**i
    numPixScan=12*(nside**2)
    spherPointScan=[]
    sphereEngScan=[]
    spherSampleScan=[]
    spherPtScan=[]
    for j in range(numPixScan):
        vecOnSpher = hp.pix2vec(nside,j)
        point = np.array([vecOnSpher[0], vecOnSpher[1], vecOnSpher[2]])
        engVal = eng(vecOnSpher[0], vecOnSpher[1], vecOnSpher[2])
        spherPointScan.append(point)
        sphereEngScan.append(engVal)
        spherPtScan.append(pT(point[0],point[1]))
    spherEventScan.append([np.around(np.array(spherPointScan), decimals=6),np.around(np.array(sphereEngScan), decimals=6),listNorm(np.array(spherPtScan))])

# Comparing a densely tiled sphere to a two element set
pencilPoint = np.array([[1,0,0],[-1,0,0]])
numPencil=2
pencilEnergy = np.full(numPencil, np.float(1./numPencil))
spherPoints1 = spherEventScan[4][0]
spherEng1 = spherEventScan[4][1]
M = _cdist_sphere(spherPoints1, pencilPoint)
emdval = emd_Calc(spherEng1, pencilEnergy, M)
print("emdval = %f" % emdval)

versions = [sys.version, np.__version__, ot.__version__,
            astropy_healpix.__version__]
names = ["Python", "Numpy", "POT", "Astropy"]
for name, version in zip(names, versions):
    print(name, version)

caricesarotti · 2019-07-23T14:51:20Z

I ran your code exactly and I am still getting the error

hichamjanati · 2019-07-23T15:42:13Z

Ok. Could you add just before the emd2 call, inside emd_Calc:

    print(ev0norm.min(), ev0norm.sum())
    print(ev1norm.min(), ev1norm.sum())
    cost, log = emd2(ev0norm, ev1norm, M, numItermax=100000000,log=True)

and report the output here

caricesarotti · 2019-07-23T15:48:58Z

No Problem

hichamjanati · 2019-07-23T19:09:46Z

I have the same output, 0.003 so it's not a matter of numerical precision. I'm running out of options without a way to reproduce the error ...
What happens when you change the values of this distributions ? what if you make both of them uniform dists ?
Can you add the versions of cython and scipy at the end of the script ?

caricesarotti · 2019-07-24T16:49:47Z

When I consider the versions of cython and scipy, I get

SciPy 1.3.0
Cython 0.29.12

I think I have found the issue though. I talked to some collaborators who have used POT for similar things (https://arxiv.org/abs/1902.02346) and they said they countered the same problem.

The issue is with the Mac OS. There is something strange about running the POT on the Darwin platform. I am resolving the problem by switching to a linux machine

hichamjanati · 2019-07-24T16:59:00Z

Ok,
It would be great if they could create an issue so we can have a look at it.
Actually I ran the script above on a Mac OS so there should not be any problem with Darwin systems...

rflamary · 2019-07-25T06:19:47Z

yes I agree with @hichamjanati,

if this is a bug that affect several person, we should hunt it down or at least leave this issue open. Switching to linux is not a good answer...

caricesarotti · 2019-07-25T18:41:33Z

I agree that it would be ideal to run the script successfully on my OS, but I am running out of ideas on how to help.

I found this piece of code in the POT library:

My OS is Darwin 16, not sure if there is an issue here

aforr · 2020-07-16T22:30:44Z

I've encountered what appears to be the same problem. Like caricesarotti, I only see the problem on a mac, not on linux.

The minimal example I've found is this:

import numpy as np
import ot
ot.emd2([], [], np.ones([292, 2]))

On linux, that outputs 1.0000000000000004, as expected. On my mac, it outputs

[...]/ot/lp/__init__.py:421: UserWarning: Problem infeasible. Check that a and b are in the simplex
  check_result(result_code)
0.0

It seems to be related to the shape of the input, but I haven't figured out how. Some other examples: using np.ones([293, 2]) or np.ones([292, 4]) works fine, while np.ones([313, 3]) fails.

OS and package versions where I do see the problem:

>>> import platform; print(platform.platform())
macOS-10.14.6-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:21:09) 
[Clang 9.0.1 ]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.18.5
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.5.0
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

OS and package versions where I do not see the problem:

>>> import platform; print(platform.platform())
Linux-4.4.0-165-generic-x86_64-with-glibc2.10
>>> import sys; print("Python", sys.version)
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:43:00) 
[GCC 7.5.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.19.0
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.5.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

I'm planning to follow caricesarotti's suggestion and switch to linux.

merplumander · 2020-11-11T10:35:09Z

Having the same problem and I'm pretty much stuck on Mac.

I can reproduce aforr's minimal example (also same behaviour for the other shapes he tried).

>>> import platform; print(platform.platform())
Darwin-19.3.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.5 (default, Jun 17 2018, 12:13:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.19.4
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.4.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

I appreciate all the efforts for a resolution :)

ncourty · 2020-11-11T15:10:50Z

Yes, I can reproduce the issue. A very quick fix-up:
ot.emd2([], [], np.ones([292, 2],dtype=np.float32))
or
ot.emd2([], [], np.ones([292, 2])*1.)
would work. I guess the problem comes from casting the integer cost into 32-bits float in the C code that might cause this rounding issue, that triggers the marginal errors. I will investigate.

merplumander · 2020-11-11T16:28:01Z

Thanks for the work you are putting into this!

Unfortunately, both of the quick-fix suggestions don't work for me. They still produce the warning and return 0.0 ...
And I mean not just in my use case, but the literal code that you posted.

ncourty · 2020-11-11T22:47:07Z

Damn I was too quick in my answer. This is indeed a very weird bug... Looking into it, I will keep you posted

ncourty · 2020-11-11T23:53:22Z

Ok I noticed a small problem with an EPSILON constant in the C code. Could you please install the branch 'precision_concern' and test it on your problem ? The modifications I performed did actually solved the issue for the previous problem (with the [292,2] array).

merplumander · 2020-11-12T09:35:26Z

That solves both the toy test case and my actual use case and there is no need to do any casting (it works with and without casting)!

Thanks so much for looking into this and being so super responsive! :)

ncourty · 2020-11-12T10:35:24Z

Great ! Thanks for the quick feedback. I will do a PR and close the issue.

alsuhr-c mentioned this issue Feb 24, 2020

Infeasible solution for EMD but inputs are in the simplex? #126

Closed

rflamary assigned hichamjanati and ncourty Nov 11, 2020

ncourty closed this as completed Nov 12, 2020

rflamary mentioned this issue Nov 12, 2020

change precision EPSILON in C code #217

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not in simplex -- two sets of largely different sizes #93

Not in simplex -- two sets of largely different sizes #93

caricesarotti commented Jul 19, 2019 •

edited

Loading

ncourty commented Jul 19, 2019

caricesarotti commented Jul 19, 2019

hichamjanati commented Jul 22, 2019 •

edited

Loading

caricesarotti commented Jul 22, 2019

hichamjanati commented Jul 22, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019 •

edited

Loading

caricesarotti commented Jul 24, 2019

hichamjanati commented Jul 24, 2019

rflamary commented Jul 25, 2019

caricesarotti commented Jul 25, 2019

aforr commented Jul 16, 2020

merplumander commented Nov 11, 2020

ncourty commented Nov 11, 2020

merplumander commented Nov 11, 2020 •

edited

Loading

ncourty commented Nov 11, 2020

ncourty commented Nov 11, 2020

merplumander commented Nov 12, 2020

ncourty commented Nov 12, 2020

Not in simplex -- two sets of largely different sizes #93

Not in simplex -- two sets of largely different sizes #93

Comments

caricesarotti commented Jul 19, 2019 • edited Loading

ncourty commented Jul 19, 2019

caricesarotti commented Jul 19, 2019

hichamjanati commented Jul 22, 2019 • edited Loading

caricesarotti commented Jul 22, 2019

hichamjanati commented Jul 22, 2019

caricesarotti commented Jul 23, 2019

That is very interesting. When I run that script in Jupyter with a python 3 compiler, I get:

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019

caricesarotti commented Jul 23, 2019

hichamjanati commented Jul 23, 2019 • edited Loading

caricesarotti commented Jul 24, 2019

hichamjanati commented Jul 24, 2019

rflamary commented Jul 25, 2019

caricesarotti commented Jul 25, 2019

aforr commented Jul 16, 2020

merplumander commented Nov 11, 2020

ncourty commented Nov 11, 2020

merplumander commented Nov 11, 2020 • edited Loading

ncourty commented Nov 11, 2020

ncourty commented Nov 11, 2020

merplumander commented Nov 12, 2020

ncourty commented Nov 12, 2020

caricesarotti commented Jul 19, 2019 •

edited

Loading

hichamjanati commented Jul 22, 2019 •

edited

Loading

hichamjanati commented Jul 23, 2019 •

edited

Loading

merplumander commented Nov 11, 2020 •

edited

Loading