Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybind11.h not found when installing using pip #307

Closed
taketwo opened this issue Apr 24, 2018 · 23 comments
Closed

pybind11.h not found when installing using pip #307

taketwo opened this issue Apr 24, 2018 · 23 comments

Comments

@taketwo
Copy link

taketwo commented Apr 24, 2018

I'm trying to install python bindings on Ubuntu 16.04 machine:

$ pip3 install pybind11 nmslib
Collecting nmslib
  Using cached https://files.pythonhosted.org/packages/de/eb/28b2060bb1750426c5618e3ad6ce830ac3cfd56cb3eccfb799e52d6064db/nmslib-1.7.2.tar.gz
Requirement already satisfied: pybind11>=2.0 in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (2.2.2)
Requirement already satisfied: numpy in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (1.14.2)
Building wheels for collected packages: nmslib
  Running setup.py bdist_wheel for nmslib ... error
  Complete output from command /homes/alexandrov/.virtualenvs/pytorch/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-0y71oxa4/nmslib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-916r1rr9 --python-tag cp35:
  running bdist_wheel
  running build
  running build_ext
  creating tmp
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpwekdswov.cpp -o tmp/tmpwekdswov.o -std=c++14
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpyyphh022.cpp -o tmp/tmpyyphh022.o -fvisibility=hidden
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  building 'nmslib' extension
  creating build
  creating build/temp.linux-x86_64-3.5
  creating build/temp.linux-x86_64-3.5/nmslib
  creating build/temp.linux-x86_64-3.5/nmslib/similarity_search
  creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src
  creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/method
  creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/space
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I./nmslib/similarity_search/include -Iinclude -Iinclude -I/homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c nmslib.cc -o build/temp.linux-x86_64-3.5/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO="1.7.2" -std=c++14 -fvisibility=hidden
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  nmslib.cc:16:31: fatal error: pybind11/pybind11.h: No such file or directory
  compilation terminated.
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Clearly, pybind11 headers were not installed on my machine. This library is not packaged for apt-get (at least not for Ubuntu 16.04), so I needed to manually install from source.

Would be nice if nmslib install script took care of this.

@benfred
Copy link
Contributor

benfred commented Apr 24, 2018

Interesting. I have an ubuntu 16.04 lts machine, and I've successfully installed nmslib using pip when pybind11 wasn't already installed.

'pip install nmslib' should install pybind11 if it isn't already installed - but in your case it looks like it pip thinks it was installed already (Requirement already satisfied: pybind11>=2.0).

What does python -c 'import pybind11; print(pybind11.get_include()) print out? Based off the compile flags I'm guessing that it's not returning the fully qualified path =(

@benfred
Copy link
Contributor

benfred commented Apr 24, 2018

This might be related to this issue: pybind/pybind11#1344

@taketwo
Copy link
Author

taketwo commented Apr 24, 2018

Sorry what I posted was a bit misleading. Initially when I tried pip3 install nmslib it failed importing pybind11. Thus I explicitly installed pip3 install pybind11 and then repeated nmslib installation. So in the description I wrote $ pip3 install pybind11 nmslib just to make it clear that the Python package pybind11 is available and this should not be a problem.

$ python -c 'import pybind11; print(pybind11.get_include())'
include

@searchivarius
Copy link
Member

@benfred this is the issue I mentioned before. There seems to be a bug in python setup tools: it ignores requirements specs.

@benfred
Copy link
Contributor

benfred commented Apr 24, 2018

hmm - so there might be two separate problems there (not installing pybind11 as part of the install, and not getting the correct path from pybind11 once it is installed).

I think I've managed to replicate the problem here. It seems like the issue is with pybind11 and pip10 . With that combination the 'pybind11.get_include' returns just 'include' instead of the fully qualified path:


In [1]: import pybind11

In [2]: pybind11.get_include()
Out[2]: 'include'

In [3]: import pip

In [4]: pip.__version__
Out[4]: '10.0.1'

versus


In [1]: import pybind11

In [2]: pybind11.get_include()
Out[2]: '/Users/ben/anaconda3/include/python3.6m'

In [3]: import pip

In [4]: pip.__version__
Out[4]: '9.0.3'

@benfred
Copy link
Contributor

benfred commented Apr 24, 2018

@taketwo as a short term fix, trying going 'pip install pip==9.0.3' and then installing nmslib - longer term I think we will have to wait for a patched version of pybind11

@searchivarius I'll see if I can replicate this with the install_requires being ignored

@taketwo
Copy link
Author

taketwo commented Apr 25, 2018

Thanks guys for your quick reaction. I installed pybind11 manually to /usr/local, so for now I'm fine. Just wanted to flag this issue for the record.

May I squeeze in a small offtopic question? (Did not find a support forum or mailing list.) I have a relatively small cloud of 3D points (between 1K and 4K) and I need to repeat a lot of single nearest neighbor lookups as fast as possible (approximate results are okay). Which parameters (build/search time) are most relevant for me, what's your expert opinion?

@searchivarius
Copy link
Member

@taketwo it's hard to tell without looking at the data. However, for such a small collection, it can be hard to beat bruteforce search. How fast should be fast? How large are you vectors (dim)?

@taketwo
Copy link
Author

taketwo commented Apr 25, 2018

@searchivarius My vectors have only 3 dimensions (points in Euclidean space), they are sampled from surfaces of object models. Ideally I'd like to keep each query around or under 1us, though I don't know how realistic is this.

@searchivarius
Copy link
Member

Hi @taketwo , I played with randomly generated 3d data. Your real data is likely to have lower retrieval times. I don't get 1us, but for 5K records I can get 700K queries per second on my 4-core laptops. Here is the command line that I used to test it:

release/experiment -s l2 -i ~/TextCollect/VectorSpaces/unif5K_3d.txt -m hnsw -c M=30,efConstruction=10000,post=2  -t ef=4  -Q 1000 -b 1 -k 10 --threadTestQty 4

Construction parameters are :
M=30,efConstruction=10000,post=2
Query parameters are:
ef=4
Recall@10 was 98%

@taketwo
Copy link
Author

taketwo commented Apr 25, 2018

Thanks @searchivarius, this is already quite reasonable. Perhaps I could bring the query time further down by exploring the parameter space around your suggestions.

@searchivarius
Copy link
Member

@taketwo yes, your results can certainly vary: random uniform data is a notoriously difficult even for lower dimensions. I would be curious to learn about your results. Thanks!

@yurymalkov
Copy link
Member

@searchivarius M=30 is way to many for d=3. Probably M=3..6 would be much better choice

@searchivarius
Copy link
Member

@yurymalkov it wasn't in my test.

@taketwo
Copy link
Author

taketwo commented Apr 25, 2018

I created a test set with actual data that I need to process (1K records and 1M queries). I'm using the knnQueryBatch function in the python interface and %timeit for benchmarking. As I wrote before, I only need to find a single nearest neighbor for each query point.

With your suggested parameters it takes 2.1s.

I tried to vary every option (separately) and was able to bring it down to 1.2s with:

  • M = 2
  • ef = 1
  • rest are the same

This all is done with a single core, because in my application I will need to repeat such batch queries against many different indices, thus I will be parallelizing on that level. But for the record, increasing the number of threads, I see only marginal improvement. With 4 threads and best parameter set it takes 1.1s.

@searchivarius
Copy link
Member

searchivarius commented Apr 25, 2018 via email

@searchivarius
Copy link
Member

sorry I missed u mentioned knnquerybatch.
but do u actually have multiple cores?

@taketwo
Copy link
Author

taketwo commented Apr 25, 2018

Yes, my workstation has "Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz" and lscpu reports it has 12 CPUs with 2 threads on each.

@searchivarius
Copy link
Member

@taketwo hmmm this is strange. Did you try to explicitly set num_threads to say 6?

@taketwo
Copy link
Author

taketwo commented Apr 26, 2018

Yes, I'm passing num_threads explicitly to the query function. Benchmarked all numbers from 1 to 12, with timeit set to repeat run 50 times. Here are the results:

num_threads time (ms)
1 1144.315
2 1019.341
3 974.499
4 1038.168
5 1028.630
6 1024.223
7 1031.905
8 1046.275
9 1046.863
10 1051.944
11 1053.538
12 1047.510

There seem to be a sweet spot at 3 threads which I did not notice yesterday because I only tried some even numbers.

@searchivarius
Copy link
Member

@taketwo something is wrong :-( this is nearly perfectly parallelizable

@taketwo
Copy link
Author

taketwo commented Apr 26, 2018

In the meantime I gave a try to a different ANN library which is designed specifically for low-dimensional spaces (libnabo), and it is dramatically faster, 26 ms for the same query. So I think for now I'll stick with that one. Thanks for your support!

@searchivarius
Copy link
Member

Good to know @taketwo thanks for testing. We never optimized for low-dimensional spaces, but there's clearly much to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants